Tutorial Contents

Robust Fit: external data

Data source

Polynomial fit

Robust smooth

Output values

Contents

Robust Fit: External Data

You can use DataView robust fit facilities to smooth or fit a polynomial equation to data from sources external to the program.

The algorithms used for robust fitting (i.e. reducing the influence of outliers) are described here.

Data source

The external data source should be plain-text numbers arranged in two tab- or comma-separated equal-length columns, with the left column containing the X values and the right column containing the Y values. There can be one or more text header rows, but these are ignored. The X values of the data do not have to be evenly spaced, and if they are not in order, they are automatically sorted by the program.

There are 3 ways to load data into the program:

  1. Copy the data onto the clipboard outside of DataView, and then click the Paste button in the Robust Fit dialog.
  2. Click the Load button in the dialog and select a text file (.txt) containing the data.
  3. Drag-and-drop a text file containing the data from File Explorer onto the dialog.

Robust Polynomial Fit

When you load the data, a notification message tells you that they were automatically sorted in ascending X order, and then the X-Y values display as a scattergraph.

The data were originally generated from a second degree (i.e. second order) polynomial equation, but then several Y values were replaced by outlier numbers (mainly 0s). The X values are not evenly spaced (there are gaps in the sequence), and, for demonstration purposes, the order of values were randomly shuffled (hence the message on loading).

The red line drawn through the scatterplot shows the robust fit of a polynomial function to the data. It happens that the default vaues of the parameters are pretty much optimal for this analysis, so the red line is a close (by eye) fit to the data excluding the obvious outliers. The cofficients of the polynomial equation are displayed, and the fitted equation is thus:

y = -0.300074 x2 + 49.9903 x + 27.2745

It may be instructive to change parameters to explore how this affects the fit.

Hopefully, you now have a reasonable idea of how the parameter choice affects the robust fit procedure.

Robust Smoothing

Details of the algorithm for robust smoothing are given here, but the essential concept is to apply a moving average filter, but with modifications to reduce the influence of outliers.

These data show the instantaneous frequency of spikes generated by the caudal photoreceptor in a crayfish, where each data point represents the reciprocal of the time interval between a spike and the preceding spike (Y value), plotted against the time of occurrence of the spike (X value). However, the spikes were identified from extracellular recordings using template recognition, and the allowed error was purposefully set high to allow many false positives, and these generate outliers above the obvious trend. There are also a few false negatives generating outliers below the trend, mainly due to spike collision in the recording. Robust smoothing attempts to draw a line through the trend, reducing the influence of the outliers.

You should now see a very jagged line drawn approximately through the main trendline of the data. It is clearly of little value as it stands, so parameters need adjusting.

At this point the smoothed line seems like a good fit to the main trend in the data. It has a similar shape to the "ground truth" instantaneous frequency plot of spikes recorded intracellularly from the photoreceptor, where there is no ambiguity in spike recognition. (If you want to see this, load file cpr intra extra into the main program and select the Event analyse: 2-D scatter graph menu option.)

Heuristic parameter adjustment

In the examples above, parameter adjustment was basically heuristic, guided by prior knowledge of what the fitted profile should look like. So far as I am aware, there is no generalized algorithm for determining the "best" parameters, since from the data alone there is no way to distinguish between outliers that should be ignored, and extreme-but-genuine values that should not be ignored. However, this methodology does have the key advantage of reproducibility - if the same parameters are used for the analysis of different data sets, then any differences in the output are an objective reflection of differences in those data sets.

Obtaining Output Values

The Copy button has a drop-down option Copy text (also Save text) that provides the output in text format.

For the Polynomial fit option, the output starts with a list of the coefficients, and this is followed by the X-Y values of the raw data themselves. You can also just select and copy the coefficient values from the display in the dialog itself.

For the Robust smoothing option, the output consists of two X-Y columns, the first of which has the raw data, the second has the smoothed data (the X values are the same in each set).