Estimate causal direction in noisy background
Motivation: Noninvasive electrophysiological measurements like EEG/MEG
measure to large extent unknown superpositions of very many sources.
Any relation observed between channels is dominated by meaningless mixtures
of mainly independent sources.
The question is how to observe and properly interpret true interactions
in the presence of such strong confounders.
[Download] data here.
To read the data into MATLAB, type
The data consists of 1000 examples of bivariate data
for 6000 time points. Each example is a superposition
of a signal (of interest) and noise.
The signal is constructed from a unidirectional
bivariate AR-model of order 10 with
(otherwise) random AR-parameters and uniformly distributed
input. The noise is constructed of three independent sources,
generated with 3 univariate AR-models with random parameters
and uniformly distributed input,
which were instantaneously mixed into the two sensors with a random mixing
matrix. The relative strength of noise and signal was set randomly.
The data were generated with this
(Of course, the seeds for the random number generators chosen
for the challenge data are confidential.)
The task is to estimate the direction of the interaction of the signal.
A submitted result is a vector with 1000 numbers having the values
1, -1, or 0. Here, 1 means direction is from first to second sensor,
-1 means direction is from second to first sensor, and 0 means
"I don't know".
For all examples either 1 or -1 is correct. The most important
point here is the way it is counted: you get +1 point for each
correct answer; you get -10 points for each wrong answer; and
you get 0 points for each 0 in the result vector. With
this counting confidence about the result is added into
the evaluation. It is strongly recommended that for
each example the evidence for a specific finding is assessed.
Real EEG data for 10 subjects
Download the data here
To read the data e.g. of the first subject into
Each data set is an EEG measurement of a subject
with eyes closed using 19 channels according to the standard
10-20 system. The sampling rate is 256Hz. If you divide
a data set into blocks of 4 seconds (i.e. 1024 data points)
then each block is a continuous measurement which
is cleaned of apparent artefacts.
The data all have a strong signal at around 10Hz called
alpha rhythm predominantly in occipital (i.e. back part of
regions. The 10 subjects were selected from a total of
subjects according to an estimated signal to noise ratio.
The data were provided by Tom Brismar from the
Karolinska Institute in Stockholm. Any reference
to subject name or id was taken out.
The challenge is to estimate the causal direction of
the alpha rhythm for these data sets
as an average across all 10 subjects.
The result must be a single 19X19-matrix, say C.
The matrix element C_ij must reflect the 'strength' of causal
drive of channel i to channel j. Please, do not set non-significant
results to zero or reduce the result to binary numbers.
The respective figures are eventually difficult to interpret.
The precise meaning of the term 'strength' varies across
methods. Furthermore, different methods have different
meaning with respect to the question whether the causal drive
is direct or indirect. We leave these things to the participant who should give a
short explanation of what the result means.
Since the ground truth is not known, we only
results and send back a visualisation of the
result. With the permission of the authors we
put the respective figure plus a comment of the
authors on the net. The purpose is to compare
different methods for the same data and discuss
the results. Both the amount of data and the quality
is very high, and hence we can expect reasonable
estimates from many different methods. Here's a warning:
experience there is a large variability across
subjects. Therefore, one cannot expect to have consistent
results across all subjects. Also, EEG data are typically
very noisy at very low frequencies (below 1 Hz). Make sure
to avoid artefacts of slow drifts.
For illustration we show our own result for these data sets using the
[Phase Slope Index]:
Download the software to create such figures here. )
Here, each small circle shows the flow of each channel to all other
channels. Positive values (red) mean sending and negative values (blue)
mean receiving information. The values denote relative temporal
delay in pseudo-z-score sense: Absolute values larger than
2 are significant on a single subject level without correction
for multiple comparison. The method (in this form) does not distinguish
between direct and indirect interaction. The interpretation would be that
frontal channels (top panels in the figure) send information
to channels in the back.
Dr. Guido Nolte
12489 Berlin, Germany
Tel: +49 (30) 6392-1861
Fax: +49 (30) 6392-1879