SPIDER The Spider Tutorial
How can I have a look at it quickly?
Download it here . Just unzip into the directory of choice. Start MATLAB and run the script "use_spider" to install. To get started type "help spider" and "help demos". E.g run spider/demos/feat_sel/go.m
NOTE: The Spider was developed with Matlab Version 11, it may not work with earlier versions.
What happens if this tutorial is incomprehensible to me?
Before we begin, we should note that some people who found this tutorial incomprehensible found an alternative tutorial using the spider from the MPI machine learning summer school useful, which is available here . Try that if you don't get this one.
How do I train and test an algorithm?
a) Prepare your data in a matrix of attributes (e.g X) and a matrix of labels (e.g Y) such that the rows are the examples.
b) Create a data object via d=data(X,Y)X=rand(50)-0.5; Y=sign(sum(X,2)); d=data(X,Y) % make simple data
c) Train an algorithm e.g svm.knn,c45:[tr a]=train(svm,d)
tr is the predictions (a data object),
a is the model that you learnt (an algorithm object, in this case an a svm object)The predictions themselves can be accessed via tr.X . In general any members of an object (e.g. hyperparameters, model parameters) are accessed in the same way.
d) Test the algorithm on new data d2:tst=test(a,d2,'class_loss')
and measure the loss/error rate with the classification loss function. Alternatively, the predictions themselves can be accessed viatst=test(a,d2); tst.X
. We could have also calculated the loss equivalently with:
tst=test(a,d2); tst=loss(tst,'class_loss');
Note that train returns two parameters: the data results and the algorithm (which includes the learnt model). Test only returns the data results and hence cannot change the model of the underyling algorithm. Type "help train" and "help test" for a little more information. Type "helpsome objects, such as get_mean (get the mean of several results) and wilcoxon (the wilcoxon statistical test) take group objects as input
Example: get the mean loss of training SVM on 10 different splits of data.d=group; for i=1:10 d=add(d,toy); end; get_mean(train(svm,d))
a loss object takes data and produces a new data object which stores the loss between the original X and Y components r=train(svm,d); a=loss('class_loss'); train(a,r)
or, equivalently,d=toy; r=train(chain({svm loss('class_loss')}),d);
a cv object takes data and produces a group object of data objects for each tested fold. get_mean(train(cv(svm),d))
a grid_sel object takes a group of algorithms and chooses the one with the best predicted generalization error (measured by another object, e.g the cv object). r=train(grid_sel(param(knn,'k',[1:5])),d);
Multi-class classification
Multi-class (and multi-label) classification is implemented by having a Y matrix with columns equal to the number of classes, for a given example the i^th row is 1 if it belongs to class i, and -1 otherwise. Here is an example:d=gen(bayes({gauss([-1 3]) gauss([0 4]) gauss([1 2])},'l=6'))
This generates a three class dataset, with each class coming from a Gaussian with differing mean and standard deviation. Now if we look at d.X and d.Y (inputs and outputs) they are:X:
We now have to train a multi-class classifier on this, so we can no longer use a vanilla SVM. We could use for example, the one-vs-the-rest method with an SVM:
1.6339 4.2481
0.3305 4.1003
-1.1005 3.1459
-1.1459 2.8517
1.2835 2.0098
0.6552 1.2341
Y:
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 -1 1
-1 1 -1
[r,a]=train(one_vs_rest(svm),d)
You can look at the individual trained machines and results using indices:
r{1}
a{2}
r{2}.X % look at predictions of second machine
a{1}.alpha % look at alphas of first machine
See spider/demos/mclass for more examples.
Feature Selection
To perform feature selection, most feature selection objects have a member feat which is a scalar indicating how many features should be selected. The output of a feature selection algorithm is then a data object where the X component has that chosen number of features. Often, the features are ranked by column, with the most important feature as decided by the algorithm placed in the first column.Many of the algorithms, e.g. fisher , rfe and l0 can also perform a classification step after feature selection. This is controlled by the boolean output_rank. (If this is set to 1, the ranked features are output, otherwise classification is performed.)
Lets look at an example. We will use rfe on some toy data, selecting 10 features.
d=gen(toy);
a=rfe;
a.feat=10;
a.output_rank=1;
[r,a]=train(a,d);
Now, we have trained the system. We are returned a trained model and a new data object with 10 features. The trained model stored in a contains a member rank which includes the indices of the chosen features.
See spider/demos/feat_sel for more examples.
Unbalanced data
If the number of positive and negative examples is highly unbalanced, and/or you wish to measure the loss in a classification problem giving more weight to one of the two classes, in some kernel methods such as svm you can use the balanced ridge to deal with this. You simply seta=svm;
a.balanced_ridge=0.1;or some other value to indicate the amount of regularization (larger = more regularization.) It works the same as adding a ridge parameter, that is it adds a constant to the diagonal of the kernel matrix, but adds r p/(n+p) to the positive examples and r n/(n+p) to the negative examples, where r is the balanced ridge parameter (0.1 in the example above) p is the number of positives, and n is the number of negatives. The parameter ridge is totally separate.
This has the effect that e.g. if you have a very small number of positive examples you make sure you classify them correctly at the expense of classifying incorrectly the larger, negative class.
More advanced topics: kernels, saving memory, implementing your own objects, ..
Click here for more advanced topics.