4. What becomes of the dot product between feature and target when the features (and the target) are standardized?

5. When does it make sense to take the Log of the data matrix?

6. What is a sytematic error? What is an intrinsic error?

7. How can one get rid of systematic errors?

8. What is an ANOVA model?

x

(i index of observation, j index of “treatment” of “class”)

The ANOVA model supposes additive noise and equal variance in the classes (so take the log if you see the variance increase with the variable magnitude). The ANOVA test compares the variance of the controlled factor v (variance explained by the model) to the intrinsic variance of e (residual variance or "intra-class" variance). If the first one is statistically significantly larger than the second one, factor v is found to contribute significantly to the noise.

^

desired undesired

^

known unknown

^

controllable uncontrollable

^

observable unobservable

- The desired factor is our target (class labels) e.g. disease or normal

- The undesired factors are all the nuisance variable causing variance in the data that is not related to our target, e.g. differences in sample processing, temperature, patient gender, etc.

- The unknown factors are those which we have not considered yet (not recorded or controlled) the others are considered known.

- The uncontrollable factors are those on which we have no any handle (e.g. the weather, something happening inside the instrument to which we do not have access, some patient behavior that we cannot change).

- Controllable factors on the other hand let us choose values and lend themselves to experimental design.

- Unobservable factors are those uncontrolled factor that we cannot even record (something happening inside the instrument to which we do not have access, some patient behavior that we cannot monitor).

- Observable factors are all the remaining factors that we can record, even though we might not be able to control them (e.g. the weather).

When designing an instrument, we should try to go in the direction

unobservable -> observable

uncontrollable -> controllable

unknown -> known

so that we can more effectively reduce the undesired variance.

10. What is experimental design? What is a "confounding factor"? Give examples of experimental plans.

11. Why is it important to record a lot of "meta data"? Why is it difficult to plan experiments with a lot of factors of variability? How should one proceed?

11. What is a standard operating procedure (SOP)? What is calibration? What is this good for?

12. Is calibration always desirable?

13. What is a "match filter"? Give examples of learning algorithms using "match filters".

Instead of a dot product, other similarity measures can be used. "Template matching" or "nearest neighbor" algorithms use match filters.

16. What are the similarities and differences between methods based on filter banks and convolutional methods?

17. If a convolution is performed in input space, to what transform does this correspond to in Fourier space and vice versa?

18. What are low/high/band pass filters? Give examples of convolutional kernels and match filters in Fourier space implementing such filters.

19. What is the Fourier transform of: a rectangle, a triangle, a Gaussian, a sinc?

triangle -> sinc2

Gaussian -> Gaussian

sinc -> rectangle

20. Give examples of feature construction methods that are not simple normalizations and cannot be implemented by either match filters or convolutions.

- connected component algorithms

- deskewing

- histograms

21. What is a convolutional neural network?