Clustering of an aperiodical medical data
Abstract— There were provided aperiodical medical data representing time series of CO2 levels in blood, information about used drugs and manipulation with a patient after a head analysis. The task was to find possible typical clusters in these time series and also find whether drugs or manipulation influence CO2 levels. Made experiments using SOM and KMeans algorithms showed distribution of clusters and also importance and influence of drugs and manipulation factors.
Original school report - Clustering of an aperiodical medical data (pdf, 138KB).
Assignment
The task is to find out typical runs (clusters) of signal CO2 in provided medical data and also whether drugs or manipulation influence these runs.
Introduction
Medical data provided for semestral work are strictly aperiodical but not random. There are presented data samples of two patients after head harm - particularly development of level of CO2 in longer time period (several hours) in the datasets. The task requires to try to find typical time series and influence of drugs and manipulation with the patient. It could be used to allow medical stuff to control the current level of CO2 more properly.

Time behaviour of CO2 levels, patient 1.
The datasets comes from diploma thesis focused on neuron sets and time series prediction. The dataset's format is not complicated - each row represents one time-window.
| CO2(t-90) | CO2(t-45) | CO2(t-0) | manip(t-0) | drugs(t-25) |
| 6.700000 | 6.700000 | 6.780000 | 6.000000 | 6.000000 |
| 6.650000 | 6.690000 | 6.800000 | 6.000000 | 6.000000 |
| 6.660000 | 6.680000 | 6.790000 | 6.000000 | 6.000000 |
| 6.650000 | 6.690000 | 6.790000 | 6.000000 | 6.000000 |
| 6.660000 | 6.710000 | 6.810000 | 6.000000 | 6.000000 |
Methodology
There is described a theoretical background of experiments, used tools and tool's configuration in this section.
Data mining algorithms
For analysis of datasets, creating and evaluation of models and clustering were used various algorithms. For general knowledge of data structure were used SOM as there is required unsupervised learning for finding relations in time series.
Afterwards there was used KMeans algorithm for finding appropriate clusters according to learned SOM map. Using KMeans it was possible to create a data model and look up other relations in relevant data parts.
Tools and configuration
Matlab and SOM toolbox were used as software tools. Large scale of functionality of these two tools was used. Datasets were normalized using som_normalize() function, size of SOMs was chosen by software automatically according to size of particular dataset. KMeans validation (the best number of clusters) was also done automatically using Davies-Boulding index.
Experiments
CO2 levels analysis
First experiment which was made was creating a learned SOM map, using only CO2 parameters and it's visualization to get a knowledge of a structure of the data.
Learned SOM using CO2 levels, patient 1.
It is obvious from the U-matrix that there exist two big clusters (of
not properly known inner structure). Top cluster
approximately represents lower values of CO2 time-windows where
and bottom cluster represents higher values where
.
The second step of the experiment was to divide the dataset according to these two clusters. For data classification was used KMeans algorithm - searching for up to 5 clusters - low number of cluster prevents over-learning.

Results of KMeans for 1 to 5 clusters, patient 1.
The best clustering of 1 to 5 clusters according to Davies-Boulding index was the same one which was presented in CO2 levels U-matrix according to SOM map which is interesting.
The next step in experiment was to discover inner structure of mentioned clusters. Data were separated using matlab into two matrices which was later used separately again for SOM learning. Results are presented bellow and bring some new information.
![]() Bottom cluster |
![]() Top cluster |
Top cluster (
) has in detail more complicated structure and could be divided into
several smaller clusters. On the other hand the majority of the bottom
cluster (
) is created by one solid area - this cold be also interpreted as similar behaviour (and possibly predictable) of CO2 levels in a specified range
Influence of drugs and manipulation
The second part of experiments was targeted on finding possible influence of drugs and manipulation to CO2 levels.
It was necessary to prepare another dataset based on input datasets
which could help to find appropriate answers. This dataset contained
columns drugs and manipulation and third calculated column change - change of level in one time-window:
. This datased was than also analysed using SOM toolbox.
| change | manip(t-0) | drugs(t-25) |
| 0.08 | 6 | 6 |
| 0.17 | 6.11 | 6 |
| -0.17 | 6.219999 | 6 |
Visualisation of computed results again brought some interesting information.

Behaviour of CO2 level according to drugs and manipulation, patient 1.
Influence of manipulation
The top-left corner of change map visualization is the most important for finding influence of manipulation. It is obvious that influence of manipulation is rather marginal because the values of change in this area vary only around small increase or decrease. This could led to proclamation that manipulation is not so important variable according to level of CO2.
Influence of drugs
Now is the most important the bottom-right corner of change map visualization. It is opposite situation from previous and area could be divided into two parts - darker area representing lower values of drugs and lighter area in the very corner representing high levels of drugs. It is interesting that (according to visualization) lower levels of drugs often led to increase of CO2 level and on the other hand high levels of drugs could also led to small decrease of the same levels (four neurons in the very corner), but it is definitely obvious that drugs could essentially influence CO2 levels.
Discussion
The proper analysis was targeted only on one of the patients. I did not validated data against the second dataset because of this reasons: comparation of only two patients could not discover any common relations, datasets are a lot different and it could be seen from time behaviour plots. From this point of view I decided not to compare because of insufficient amount of data (datasets for different people) and rather do a proper analysis for one person. The usage of this work I see in the area of further research on similar medical data and inspiration of how to manipulate with the results and eventually prepare more specific datasets to find further relations.
Conclusion
The aim of the semestral work mentioned in the introduction was to discover clusters in CO2 levels time series and to find possible influence of drugs and manipulation to these levels. Using the SOM this work discovered two main clusters in CO2 level development for lower and higher values and it's simple or complicated inner structure. The second part of the task was fulfilled by creating a new dataset and it's further analysis that brought proof that manipulation is not so important variable but on the other hand drugs could significantly influence both increase and decrease of CO2 level according to drugs value (type and amount).
References
- Ing. Josef Bouska: Neuronove site pro predikci casovych rad, FEE CTU,2008.
- Website of the subject Y336VDhttp://ida.felk.cvut.cz/moodle/course/view.php?id=35, FEE CTU, 2008.



RSS Feed