Tutorial Contents

Cluster traces

Average traces by colour

Standard deviation

Contents

Cluster Traces

If you have a data file with many traces, some of which contain similar waveforms, it could be useful to cluster the traces into groups, where traces within a group have similar waveforms. You could then average the traces within each group to obtain a reduced-noise representation of the waveforms.

There has to be some metric to define the "shape" of the waveform in order to determine similarity. Of course, the raw data of the waveform itself would be one such metric, but this will inevitably contain a certain amount (possibly a large amount) of noise, and this will contaminate any clustering algorithm. Also, there is likely to be strong serial correlation within each trace, resulting in a lot of redundant information being passed to the clustering algorithm.

A standard way to extract an ordered set of "key features" from noisy data is principle component (PC) analysis. The weighting coefficients of the first few basis waveforms are then used as the data for clustering.

This file has 38 traces of constructed data, and each trace consists of random noise plus a superimposed fragment of a sine wave. The sine wave has constant frequency but 4 differentThe smallest is 0 - i.e., no sine wave at all. amplitudes, randomly distributed between traces. The aim is to cluster the traces into groups such that traces within a group contain sine waves of similar amplitude.

With this number of traces, the data on individual traces are hard to distinguish in the standard Chart view due to vertical compression.

This shows the individual traces in a grid of thumbnails, so the content is more easily visualized.

At the top-left of the dialog the Trace list shows the traces that will be clustered. By default this is all traces (1-38), but you could specify just a subset if desired. There is also a Start time and End time, indicating the time window within the traces that will be analysed. By default this window is the entire window visible in the main view. If only a section of the recording shows anything of interest you could zoom in (horizontal magnify) on that section before activatng the dialog. Alternatively, if you place 2 vertical cursors in the main view, the Cluster Traces time window will be set to match the location of those cursors.

The first task is to calculate the principle components of the waveforms and display their weighting coefficients.

A set of red dots appear in the 3D scattergraph. These are the data (weighting coefficients) we need to cluster. There are 38 dots since there are 38 traces, and since we calculated 3 PCs, each dot is located in 3D space. We can thus display the coefficients on the X, Y and Z axes of the graph. The coefficients are displayed in order, i.e. the X axis shows coefficients for the 1st PC, the Y axis for the 2nd and the Z axis for the 3rd.

The start-up view of the graph is oblique, but it is useful to view it face-on.

This presents a 2D view, showing projection on the X and Y axis. It is immediately obvious that there are 4 clusters on the X axis (1st PC), showing as 4 vertical groups of dots. However, the distribution on the Y axis is effectively random, with no obvious clusters.

We now see a Y-Z projection, and there are no obvious clusters. The clusters on the X axis are hidden because they lie in a plane orthogonal to the view.

Only 1 cluster is found!

The problem is that we are looking for clusters in 3 dimensions (the coefficients of the 1st 3 PCs), but only 1 of these actually contains clusters, the others are random. These random values swampThe algorithm treats all dimensions as of equal importance. Because these are principle components we know that the 1st dimension is the most important, but the algorithm does not know this. the algorithm, so that it does not detect the clusters in the single dimension.

Note that by default All dimensions is pre-selected in the dimensions radio-button choices to the right of the Cluster button.

Now, each trace is colouredThe order of colours may vary, because the clustering algorithm involves a randomization element. according to the group that it belongs to, which depends on the amplitude of its sine wave. This is most obvious in the Matrix view that you launched at the beginning of the tutorial. Thus trace 1 contains the smallest sine wave, as do traces 4 and 5, and other traces in the same colour. Trace 2 contains the largest sine wave, while trace 7 is only noise - the sine wave amplitude is 0.

cluster traces
The Cluster Traces dialog (left), and the Matrix view (right). Traces have been clustered according to coefficients of the 1st principle component, and 4 clusters were detected, reflection the 4 different amplitudes of sine wave in the traces.

Note: The traces have been coloured according to the group to which they belong, but this information has not yet been written to file. If you want to keep the colours, you must Save this file, or use Save As to write a new file.

 

Average Traces by Colour

If you are following on from the previous tutorial (Cluster Traces) then you will have a data file ready for use. If not:

There are 38 traces of constructed data, and each trace consists of random noise plus a superimposed fragment of a sine wave. The sine wave has constant frequency but 4 differentThe smallest is 0 - i.e., no sine wave at all. amplitudes, randomly distributed between traces. The traces have been clustered into 4 groups according to similarity in their waveforms, and the group identity is coded by the colour in which the trace is drawn.

The aim is to average the waveform for each group, and write each average as a new trace.

The new file loads, and there are 4 new traces (39-42). These conveniently fill the empty cells in the Matrix view. (If there had been insufficient empty cells, we would have had to close the view and reconfigure it with more cells to see the new data.)

The new traces are the point-by-point averages of the 4 trace groups. However, the default gain for the new traces is too low for this to be obvious.

You can see that the last 4 cells of the Matrix grid are averages of the previous cells of the same colour. There is consequently a reduction in the noise in these traces, making the underlying sine wave clearer (although it was fairly clear even in the raw data).

average traces by colour
The Matrix view after Average Traces by Colour. The last 4 cells (bottom right) show the point-by-point average of raw data in the preceding cells with the same trace colour.

Standard Deviation

The Average Traces by Colour dialog has a check box labelled Write SD trace. By default this is not checked, but if you check this before clicking OK, then an additional trace is written for each group. Each average trace is followed by a trace showing the point-by-point standard deviation of the traces in the group.