Questions on lecture
4
Feature
construction
1.
What is a sigma-pi unit?
A sigma-pi unit is a special kind of Perceptron in which the phi functions
correspond to products of the original features. The unit is thus effectively
computing a polynomial function of the inputs.
2. What is a bottleneck neural network? How does
this relate to PCA?
A bottleneck neural network is a 2 layer network in which the input layer
and output layer have same dimension n and the hidden layer has a number
of outputs n'<n. A bottleneck network can be trained with the same examples
at the input and the output. If the units are linear and if the square loss
is used for training, a bottleneck network actually computes the the first
n' principal components, which are the weights of the neurons of the first
layer. The second layer reconstructs the inputs and the weights of the neurons
are given by the transpose of the weight matrix of the first layer.
3. What becomes of the dot product between patterns
when patterns are normalized with the L2 (Euclidean) norm?
The cosine between the two patterns.
4. What becomes of the dot product between feature and
target when the features (and the target) are standardized?
The Pearson correlation coefficient.
5. When does it make sense to take the Log of the data
matrix?
When the variance of the data increases with the magnitude of the features.
6. What is a sytematic error? What is an intrinsic error?
A systematic error is an error that can be explained and reduced by calibration
or normalization. An intrinsic error corresponds the unexplained "random"
noise.
7. How can one get rid of systematic errors?
By modeling the noise and trying to reverse the noise generating process,
by calibration or normalization.
8. What is an ANOVA model?
ANOVA stands for Analysis of Variance. An ANOVA model is a model of the
effect on observations x of a systematic (or "controlled") factor of variability
v taking a discrete number of values {v1, v2…vj ,…} and intrinsic variability e (random
error, Normally distributed):
xij = m + vj + eij
(i index of observation, j index
of “treatment” of “class”)
The ANOVA model supposes additive noise and equal variance in the classes
(so take the log if you see the variance increase with the variable magnitude).
The ANOVA test compares the variance of the controlled factor v (variance
explained by the model) to the intrinsic variance of e (residual variance
or "intra-class" variance). If the first one is statistically significantly
larger than the second one, factor v is found to contribute significantly
to the noise.
9. Build
a taxonomy of factors of variability in terms of whether they are desired,
known, controllable or observable. Explain the various cases and give examples.
When building a new instrument, in which direction should one go?
factor of variability
^
desired undesired
^
known
unknown
^
controllable
uncontrollable
^
observable
unobservable
- The desired factor is our target (class labels) e.g. disease or normal
- The undesired factors are all the nuisance variable causing variance in
the data that is not related to our target, e.g. differences in sample processing,
temperature, patient gender, etc.
- The unknown factors are those which we have not considered yet (not recorded
or controlled) the others are considered known.
- The uncontrollable factors are those on which we have no any handle (e.g.
the weather, something happening inside the instrument to which we do not
have access, some patient behavior that we cannot change).
- Controllable factors on the other hand let us choose values and lend themselves
to experimental design.
- Unobservable factors are those uncontrolled factor that we cannot even
record (something happening
inside the instrument to which we do not have access, some patient behavior
that we cannot monitor).
- Observable factors are all the
remaining factors that we can record, even though we might not be able to
control them (e.g. the weather).
When designing an instrument, we
should try to go in the direction
unobservable -> observable
uncontrollable -> controllable
unknown -> known
so that we can more effectively reduce the undesired variance.
10. What is experimental design? What is a "confounding factor"? Give
examples of experimental plans.
Experimental design is the science of planning experiments to most effectively
study the effect on a set of given factors on a given outcome. A confounding
factor in a factor (usually unknown) the value of which co-varies with another
know factor under study. For example if we want to study the effect of age
on weight but all our young people are male and all our old people are female,
the gender is a confounding factor. Of course this situation is a bogus experimental
design. Good planned experiments try to consider "all" possible combinations
of assignments of variables to values. In a factorial plan, each variable
is allowed to take only 2 values and for k variables this leads to 2k
experiments. To avoid the effect of possible unknown factors correlated with
time, the order of the experiments can be randomized (randomized plan). To
be able to study the variance of a given factot on the outcome, factors can
be kept constant in some experiment blocks (block design).
11. Why is it important to record a lot of "meta data"? Why is it
difficult to plan experiments with a lot of factors of variability? How should
one proceed?
It is important to record a lot of "meta" data to be able to eventually
explain some of the unexplained variance. However, all the factors recorded
are not always controlled in the planned experiment because or the combinatorial
explosion of the number of experiments to be ran when a lot of factors are
considered simultaneously. One should proceed iteratively by ruling out hypotheses
progressively.
11. What is a standard operating procedure (SOP)? What is calibration?
What is this good for?
A standard operating procedure is a series of steps taken to generate
the experimental data that is well documented and as reproducible as possible.
SOP are used to reduce the unexplained variance. Calibration is a measurement
made in a standard way, which allows normalizing the data (e.g. shifting
or scaling it). For example, a standard solution may be periodically ran
in place of the real solutions to be analyzed.
12. Is calibration always desirable?
Not always. The calibration measurement may also have variance. Calibration
in some cases can result in an increase of variance. One may prefer instead
to normalize with a local average the itself because the normalization factor
would then be computed from more data.
13. What is a "match filter"? Give examples of learning algorithms
using "match filters".
It is a vector of coefficients tk or "template" that
we use to compute a feature value fk as the dot product between tk and the input patterns x: fk = tk . x
Instead of a dot product, other
similarity measures can be used. "Template matching" or "nearest neighbor"
algorithms use match filters.
14. What is a "filter
bank"? Give examples of classical transforms based on filter banks.
An ensemble of match
filters is called a filter bank. Often the elements of a filter bank are
chosen to be orthogonal. The cosine transform and the Fourier transform use
orthogonal filters. So does PCA.
15. What is a convolution?
Give examples of convolutional kernels. What is their effect on the signal?
A convolution is also a dot product operation aiming at producing new
features. But this time, instead of using templates that are as different
of one another as possible, we use a single template called "kernel" that
we translate in all possible ways. For each position, we compute the dot
product to obtain one feature in the new representation. A Gaussian kernel
performs a local average and therefore smoothes the signal. A Mexican hat
kernel enhances edges. Some kernels can be designed to extract end-points
or lines.
16. What are the similarities and differences between methods based
on filter banks and convolutional methods?
Both methods are based on dot products. Filter bank methods use templates
that are as different as possible from one another. Convolutional methods
use a single template in all possible positions.
17. If a convolution is performed in input space, to what transform
does this correspond to in Fourier space and vice versa?
A convolution in input space corresponds to a match filtering in Fourier
space (the match filter being the Fourier transform of the convolutional
kernel) and vice versa.
18. What are low/high/band pass filters? Give examples of convolutional
kernels and match filters in Fourier space implementing such filters.
A low pass filter removes high frequency components i.e. it smoothes
the signal. Example: convolution with a Gaussian kernel. A high pass filter
on the contrary removes low frequency components (e.g. the baseline). To
achieve that effect, one can convolve with a wide Gaussian kernel and subtract
the result from the original. A band pass filter lets components in a given
frequency band go through. One can convolve with the difference of two Gaussian
kernels of different width to achieve that effect. In Fourier space, the
Fourier transform of a Gaussian being a Gaussian, one can just multiply with
a Gaussian match filter.
19. What is the Fourier transform of: a rectangle, a triangle, a
Gaussian, a sinc?
rectangle -> sinc
triangle -> sinc2
Gaussian -> Gaussian
sinc -> rectangle
20. Give examples of feature construction methods that are not simple
normalizations and cannot be implemented by either match filters or convolutions.
- contour following algorithms
- connected component algorithms
- deskewing
- histograms
21. What is a convolutional neural network?
A multi-layer neural network implementing several successive convolutions.
Each convolution is followed by a subsampling to progressively reduce the
resolution of the input and extract higer and higher level features. The
weights of the network are the coefficients of the convolutional kernels
and they are obtained by training.