function [X,Y,beta] = breiman(corr,h,seed); %BREIMAN Breiman sample generator for testing linear regression. % [X,Y,BETA] = BREIMAN(CORR,H,SEED) produces a sample of 60 (X,Y) % pairs, of 30 covariates X(i,:) and one explained variable Y(i). % X is drawn from a multivariate normal distribution N(0,S2), % where S2(i,j) = CORR^abs(i-j). % CORR (default = .1) should be in the range [-1 1]; % Y = X * BETA + EPSILON % EPSILON is drawn from a normal distribution N(0,1) % BETA is the sum of three waves centered in 5, 15 and 25: % BETA(k+i) = C*(H-i)^2 , with abs(i) < H, k = [ 5 15 25], where % C is a constant such that the R^2 is about 0.75, and % H (default = 1) is an integer governing the wave width. % H should be in {1,2,3,4,5} (corresponding to {3,9,15,21,27} % non-zero coefficients). % SEED is an optional parameter selecting the seed of the normal % pseudo-random generator used to generate X and EPSILON. % References: % Breiman, L., Heuristics of instability and stabilization in model % selection, The Annals of Statistics, 24(6), pp 2350--2383, 1996. if (nargin<3) seed = sum(100*clock); if (nargin < 2) h = 1; if (nargin < 1); corr = 0; end; end; end; randn('seed',seed) % random number initialization d = 30; % nombre de variables N = 60; % nombre de couples dans l'echantillon K = [5 15 25]; % cluster centers X = randn(N,d); EXX = corr.^abs([1:d]'*ones(1,d)-ones(d,1)*[1:d]); [V,D] = eig(EXX); X = X*sqrt(abs(D))*V'; % just in case beta = zeros(d,1); for i = 1:length(K); wave = max( 0 , h - abs([1:d]'-K(i)) ); beta = beta + (wave.^2); end; noise = randn(N,1); % additive noise N(0,1) beta = beta*sqrt(3/(beta'*EXX*beta)); % beta normalization / R2 \simeq 0.75 Y = X*beta + noise; % calcul des Y