Contents
% This matlab file is a companion to the % 2 lectures on the General Linear Model % % Cyril Pernet January/February 2011
1. Linearity
Definition
1. the quality or state of being linear 2. Electronics: the extent to which any signal modification process, as detection, is accomplished without amplitude distortion 3. Physics: the extent to which any effect is exactly proportional to its cause
TWO KEY NOTIONS
additivity (or superposition): the total response to a set of inputs is the sum of individual inputs (y = x1+x2)
scaling (or homogeneity of degree 1): the magnitude of the system output is proportional to the system input (y = ax)
see http://en.wikipedia.org/wiki/Linear
% --------------------- % Linear correlations % --------------------- clear all; clc % create 2 variables and correlate x1 = [-15:14]' + randn(30,1); x2 = 3*x1; r = corr(x1,x2,'type','Pearson'); figure('Name','Correlations') subplot(3,2,1); title('Linear correlation'); plot([x1 x2],'LineWidth',3); grid on; title(['Scalling effect: r = ' num2str(r)]); % Superposition x3 = x1 + x2; r = corr(x1,x3,'type','Pearson'); subplot(3,2,2); plot([x1 x3],'LineWidth',3); grid on; title(['Superposition effect: r = ' num2str(r)]); % Scaling and Superposition x4 = x2 + 2*x3 + 5; % = 11*x1 + 5 r = corr(x1,x4,'type','Pearson'); subplot(3,2,3); plot([x1 x4],'LineWidth',3); grid on; title(['Scaling and Superposition effect: r = ' num2str(r)]); % Non linear ^2 x5 = x1.^2; r = corr(x1,x5,'type','Pearson'); subplot(3,2,4); plot([x1 x5],'LineWidth',3); grid on; title(['Squaring effect: r = ' num2str(r)]); % non linear abs x6 = abs(x1); r = corr(x1,x6,'type','Pearson'); subplot(3,2,5); plot([x1 x6],'LineWidth',3); grid on; title(['Absolute value effect: r = ' num2str(r)]); % non linear but .. x7 = 3*x1 + x1.^2 + x1.^3; r = corr(x1,x7,'type','Pearson'); subplot(3,2,6); plot([x1 x7],'LineWidth',3); grid on; title(['polynomial effect: r = ' num2str(r)]); % yes polynomes are special cases % exercise: try a different argument 'type' for the corr function
2. Linear algebra
with linear algebra one uses matrices to write equations as we shall see this turns out to be very usefull as it can solve very quickly complex linear systems
a simple linear system with two variables can be 2x - y = 0 -x + 2y = 3
in matrix notation this is
clc; clear; close X = [2 -1; -1 2]; Y = [0 3]'; % we can know apply the same principle as solving a single equation % for instance y = bx is solved by finding b = 1/x * y % with matrices we also use the inverse, except that the inverse of % a matrix is a bit more complicated I = eye(2); % this is called the identity matrix because X*I = I*X = X (just as x*1 = 1*x = x) Xinv = inv(X); % this is the inverse of A defined as X*Xinv = I jus as 1/x * x = 1 test = (Xinv*X == I); % the solution of our linear system is therefore B = inv(X)*Y; % just as with numbers b = 1/x * y % which we can check XB = X*B; % = Y % in geometrical terms, each column of a matrix describes a vector - the % solution is to find how much (B) of each vector (X(:,i)) we need to arrive to the point Y
3. Simple linear regression
In a regression we have some observations Y that we want to explain by a variable X by contrast with a linear system as describe above, there is usually no perfect solution the model is y = b1x + b2 + e with b1 how much of x we need to get close to y, b2 the intercept or offset and e the error that is the part that is not explained by x. However, we can see that a regression is a linear model since y is expressed as a function of x using additivity and scaling. The statisitcs is simply to evaluate how much we explain by x given the error.
clc; clear % y = b1x + b2 + e % let generate some data x = rand(20,1)*5; % x is the VI we manipulate Y = 4*x + 3 + randn(20,1); % Y is the VD we measure, we run here a forward model + add noise % get a solution, ie find b1=4 and b2=3 X = [x ones(20,1)]; % ones are for b2 which is not a function of x but a constant to add for each x % X is not squared so inv(X) is not possible % a simple solution is to multiply by the transpose % making the matrix X'X square % Y = XB --> X'Y = X'XB --> inv(X'X)X'Y = inv(X'X)X'XB --> but inv(X'X)X'X = I thus B = inv(X'*X)*X'*Y; % check what it looks like Yhat = X*B; % Yhat is the model Res = Y - Yhat; % Residual (error) % get some statistics % = estimate if the model Yhat explain above the error Res SStotal = norm(Y-mean(Y)).^2; % what is the norm of a vector? = sum((Y-mean(Y)).^2) SSeffect = norm(Yhat-mean(Yhat)).^2; % just like in books, sum of the difference xi - mean(x) SSerror = norm(Res-mean(Res)).^2; df = rank(X)-1; % what is the rank ? see also http://en.wikipedia.org/wiki/Linear_independence dferror = length(Y) - df - 1; R2 = SSeffect / SStotal; % how much we explain of the total F = (SSeffect / df) / (SSerror / dferror); % how much is explain given the error, this is the same as R2*dfe / (1-R2)*df p = 1 - fcdf(F,df,dferror); % make a figure figure('Name','Simple Regression'); subplot(1,2,1); plot(Y,'x','LineWidth',5); hold on plot(Yhat,'r','LineWidth',2); grid on; title(['Data and model F=' num2str(F) ' p=' num2str(p)]) subplot(1,2,2); normplot(Res);
4. Multiple linear regression
This is the same as above except that we have more variables - however, using linear alebra we can find as easily the regression coefficients. In addition to testing the model, we usually also test for the weight of each regressor. The relative contribution of 1 regressor to the data given the model is the partial correlation coefficient (how much x1 explains of y when we control for the influence of all other xs on both y and x1) and the relative contribution of 1 regressor to the total variance is the semi-partial correlation coefficient (how much x1 explains of y when we control for the influence of all other xs on x1 only)
clc; clear; close % y = b1x1 + b2x2 + b3x3 + b4x4 + b5 % let's take data stored in matlab exemples load hald; Y=hald(:,5) ; X=hald(:,1:4); % get a solution, X = [X ones(length(X),1)]; B = inv(X'*X)*X'*Y; % just as above Yhat = X*B; Res = Y - Yhat; SStotal = norm(Y - mean(Y)).^2; SSeffect = norm(Yhat - mean(Yhat)).^2; SSerror = norm(Res-mean(Res)).^2; df = rank(X)-1; dferror = length(Y) - df - 1; R2 = SSeffect / SStotal; F = (SSeffect / df) / (SSerror / dferror); p = 1 - fcdf(F,df,dferror); % make a figure figure('Name','Multiple Regression'); subplot(1,2,1); plot(Y,'x','LineWidth',5); hold on plot(Yhat,'r','LineWidth',2); grid on; title(['R^2 ' num2str(R2) ' Data and model F=' num2str(F) ' p=' num2str(p)],'FontSize',14) subplot(1,2,2); normplot(Res); % ------------------------------------ % semi-partial correlation coefficient % ------------------------------------ % let's think of the model and what it means it terms of geometry. % the data Y can be described as a vector and a point in R_20 % a space with 20 dimensions. We then establish a model X with 5 % regressors; that is we look for a combination of these 5 vectors which % will get as close as possible to Y. To find the actual contribution of x1 % to the data for this model one needs to look at how much x1 explains % to the total variance, ie we want to compare the R2 between the full and % a reduced model without x1 - the difference will be how much x1 explains in Y. Xreduced = X(:,2:end); % reduced model all minus 1st regressor Breduced = inv(Xreduced'*Xreduced)*Xreduced'*Y; Yhatreduced = Xreduced*Breduced; Resreduced = Y - Yhatreduced; dfreduced = rank(Xreduced) -1 ; dferrorreduced = length(Y) - dfreduced - 1; SSeffectreduced = norm(Yhatreduced-mean(Yhatreduced)).^2; SSerrorreduced = norm(Resreduced-mean(Resreduced)).^2; R2reduced = SSeffectreduced / SStotal; Freduced = (SSeffectreduced / dfreduced) / (SSerrorreduced / dferrorreduced); preduced = 1 - fcdf(Freduced,dfreduced,dferrorreduced); Semi_Partial_corr_coef = R2 - R2reduced; dfe_semi_partial_coef = df - dfreduced; F_semi_partail_coef = (Semi_Partial_corr_coef*dferror) / ... % variance explained by x1 ((1-R2)*dfe_semi_partial_coef); % unexplained variance overall p_semi_partial_coef = 1 - fcdf(Semi_Partial_corr_coef, df, dfe_semi_partial_coef); % note df is from the full model % make a figure figure('Name','Multiple Regression - Full versus reduced model'); subplot(2,2,1); plot(Y,'x','LineWidth',5); hold on plot(Yhat,'r','LineWidth',2); grid on; title(['Data and model F=' num2str(F) ' p=' num2str(p)]) subplot(2,2,2); plot(Y,'x','LineWidth',5); hold on plot(Yhatreduced,'r','LineWidth',2); grid on; title(['Data and reduced model F=' num2str(Freduced) ' p=' num2str(preduced)]) subplot(2,2,3); plot(Yhat,'b','LineWidth',2); grid on; hold on plot(Yhatreduced,'r','LineWidth',2); title('Modelled data for both models') subplot(2,2,4); plot((Res-Resreduced).^2,'k','LineWidth',2); grid on; hold on title('Difference of residuals squared') % -------------------------------- % partial correlation coefficient % -------------------------------- % As we shall see below, this is easily obtained using projections - but % for now let just think about what we want to measure. We are interested % in knowing the correlation between y and x1 controlling for the effect of % the other xs, that is removing the effect of other xs. Compred to % semi-parital coef we also want to remove the effect of xs on y or if you % prefer we want to compute how much of x1 we need to get to the point % defined by the xs which is the closest to Y % above we removed X(:,2:end) from Y and got Resreduced % we need to do the same for x1 so that we can correlate x1 and y witout xs x = X(:,1); B = inv(Xreduced'*Xreduced)*Xreduced'*x; xhat = Xreduced*B; Resx = x - xhat; % the correlation between Resreduced and Resx is the partial coef Partial_coef = corr(Resx,Resreduced); % --------------------------------------------------
5. Matlab tools for regressions (from stat toolbox)
-------------------------------------------------
clc; close [B,Bint,r,rint,stats] = regress(Y,X); % Removes NaN data % B are the beta parameters exactly what we have abbove with inv(X'*X)*X'*Y % B int are the 95% confidence intervals for the coefficient % r are the residuals - same as above % r int are the 95% confidence intervals of residuals - note it should % include 0 otherwise would suggest that a point is not well fitted (outlier?) % this can be checked with rcoplot(r,rint); % stats: R2 statistic, the F statistic and its p value, and an estimate of the error variance. regstats(Y,X(:,2:end)); % call a GUI - WARNING NO INTERCEPT SHOULD BE USED % stats = regstats(y,X,model,whichstats) does not call the gui and returns % in stats the paramters specified in whichstats [br,statsr] = robustfit(X(:,2:end),Y); % robust estimate using iterative weighted least square % GLMFIT works for regression and ANOVAs [b,dev,stats] = glmfit(X(:,2:end),Y,'normal'); % pretty cool feature is that you can have other types of distributions % shame none of the above do partial coef and semi-partial coef. % ------------------
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 2.071144e-17. Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 2.071144e-17. Warning: X is rank deficient, rank = 4 Warning: X is ill conditioned, or the model is overparameterized, and some coefficients are not identifiable. You should use caution in making predictions.
6. One-way ANOVA
-----------------
We now describe a one-way ANOVA as a linear model, just as we did with regression - the standard model is y = u + xi + e that is the data are equal to the mean + effect of treatment + error. The effect of the treatment for each condition is suppose to be the same for each y in a given condition - this can be thus be modelled using 1s.
clc; clear; close all % Y = u1 + u2 + u3 u1 = rand(10,1) + 11.5; u2 = rand(10,1) + 7.2; u3 = rand(10,1) + 5; Y = [u1; u2; u3]; % describe X as 3 conditions x1 =[ones(10,1); zeros(20,1)]; % condition 1 as a given value for y(1:10) x2 =[zeros(10,1); ones(10,1); zeros(10,1)]; % condition 2 as a given value for y(11:20) x3 =[zeros(20,1); ones(10,1)]; % condition 3 as a given value for y(21:30) X =[x1 x2 x3 ones(30,1)]; % again we add the intercept here the grand mean % now we can do the almost exactly same analysis as above B = pinv(X)*Y; % almost as above % here we use a pseudoinverse: % X is rank deficient, i.e. regressors are not independent, since any linear combination of % 3 columns can give us the 4th one, thus X'X is also rank deficient and singular ie inv(X'*X) % doesn't exist -- there is no matrix A such as A(X'X) = I - however pinv gives a unique % solution that minimizes the square distance to the data. % see % http://mathworld.wolfram.com/Moore-PenroseMatrixInverse.html % http://en.wikipedia.org/wiki/Moore_Penrose_pseudoinverse Yhat = X*B; Res = Y - Yhat; SStotal = norm(Y-mean(Y)).^2; SSeffect = norm(Yhat-mean(Yhat)).^2; SSerror = norm(Res-mean(Res)).^2; df = rank(X)-1; dferror = length(Y) - df - 1; R2 = SSeffect / SStotal; F = (SSeffect / df) / (SSerror / dferror); p = 1 - fcdf(F,df,dferror); % make a figure figure('Name','One way ANOVA'); subplot(1,3,1); plot(Y,'x','LineWidth',5); hold on plot(Yhat,'r','LineWidth',2); grid on; title(['Data and model F=' num2str(F) ' p=' num2str(p)]) subplot(1,3,2); imagesc(X); colormap('gray'); title('Design matrix'); subplot(1,3,3); normplot(Res); % for comparison try [P,ANOVATAB,STATS] = ANOVA1(Y,g,'off') % with g = [ones(30,1)]; g(11:20)=2; g(21:30)=3; % ------------------------
7. Projection matrices
------------------------
% see http://en.wikipedia.org/wiki/Projection_(linear_algebra) % P is the orthogonal projection of Y onto X sch as Y^=P*Y % the projection is orthoganal which means this minimize (in the least % square sense) the distance between Y and X % P = Xpinv(X) and Y^ = Xpinv(X)Y % which is the same as above with B = pinv(X'X)X'Y and Y^ = XB % -------------------
8. way ANOVA again
-------------------
% following the results obtained above P = X*pinv(X); % orthogonal projection of Y onto X R = eye(size(Y,1)) - P; % projection onto the error space null(X) E = Y'*R*Y; % SS error C = diag([1 1 1 0]); % test effect of treatement (thus remove last column of X) C0 = eye(size(X,2)) - C*pinv(C); X0 = X*C0; % Reduced model (i.e. only intercept) R0 = eye(size(Y,1)) - (X0*pinv(X0)); % projection onto null(C(X)) M = R0 - R; % M is the projection matrix onto Xc, what we want H = (B'*X'*M*X*B); % SSeffect F_Rsquare = (H./(rank(X)-1)) ./ (E/(size(Y,1)-rank(X))); p_Rsquare = 1 - fcdf(F_Rsquare, (rank(X)-1), (size(Y,1)-rank(X))); % ex. use a difference C to compute gp 1 vs gp 2 % C = diag([1 -1 0 0]) % ------------------------
9. repeated measure ANOVA
------------------------ Y = XB + E with X = [Factors Interactions / Subject]
clc; clear; close all % Y = u1 + u2 + u3 u1 = rand(10,1) + 11.5; u2 = rand(10,1) + 7.2; u3 = rand(10,1) + 5; Y = [u1; u2; u3]; % this time this not 3 gp but 10 subjects with 3 measures % create the design matrix for the different factors nb_subjects =10; nb_conditions =3; Subjects = repmat(eye(nb_subjects),nb_conditions,1); % error x = kron(eye(nb_conditions),ones(nb_subjects,1)); % effect X = [x Subjects]; % no more ones for the grand mean but a subject specific mean figure; imagesc(X); colormap('gray'); title('Repearted measure design','Fontsize',14) % Compute as usual df = nb_conditions -1; dfe = size(Y,1) - nb_subjects - df; P = X*pinv(X'*X)*X'; % our projection matrix R = eye(size(Y,1)) - P; % projection on error space SSe = diag(Y'*R*Y); % Y projected onto the error Betas = pinv(x)*Y; % compute without cst/subjects yhat = x*Betas; % yhat computed based on the treatment with subject effect - we use little x SS = norm(yhat-mean(yhat)).^2; F_values = (SS/df) ./ (SSe/dfe); p_values = 1 - fcdf(F_values, df, dfe); % ------------------------
10. multivariate regression
------------------------
clc; clear; close all % y = b1x1 + b5 load carbig; Y=[Acceleration Displacement];% now we have 2 Ys % get B X = [Cylinders Weight ones(length(Weight),1)]; B = inv(X'*X)*X'*Y; % just as before but now we have 2 columns, 1 per Y :-) % use projection T = (Y-repmat(mean(Y),size(Y,1),1))'*(Y-repmat(mean(Y),size(Y,1),1)); % SSCP Total P = X*pinv(X'*X)*X'; R = eye(size(Y,1)) - P; E = (Y'*R*Y); % SSCP Error % stats for the model C = eye(size(X,2)); C(:,size(X,2)) = 0; C0 = eye(size(X,2)) - C*pinv(C); X0 = X*C0; R0 = eye(size(Y,1)) - (X0*pinv(X0)); M = R0 - R; H = (B'*X'*M*X*B); % if we were doing univariate stats the SS are on the diagonal of the SSCP % matrices so simply use diag Rsquare = diag(H)./diag(T); F_Rsquare = (diag(H)./(rank(X)-1)) ./ (diag(E)/(size(Y,1)-rank(X))); p_Rsquare = 1 - fcdf(F_Rsquare, (rank(X)-1), (size(Y,1)-rank(X))); % what about multivariate regression then % we decompose the product of the SSCP effect * inv(SSCP error) % an eigen value decomposition is like a PCA, in matrix term we % look for a new set of vectors orthogonal to each other but their % combination gives the same result (in space) than the original % matrix see http://en.wikipedia.org/wiki/Eigendecomposition_of_a_matrix eigen_values = eig(inv(E)*H); % for real life processing better to use SVD to get positive results only p = size(Y,2); % = number of variables (dimension) q = size(X,2)-1; % = number of regressors (df) n = size(Y,1); % nb of observations (dfe) s = min(p,q); m = (abs(q-p)-1)/2; N = (n-q-p-2)/2; d = max(p,q); theta = max(eigen_values) / (1+max(eigen_values)); % Roy V = sum(eigen_values ./ (1+eigen_values)); % Pillai R2_Roy_value = theta; % = 1st canonical correlation R2_Roy_F = ((n-d-1)*max(eigen_values))/d; R2_Roy_p = 1-fcdf(R2_Roy_F, d, (n-d-1)); R2_Pillai_value = V / s; % average of canonical correlations R2_Pillai_F = ((2*N+s+1)*V) / ((2*m+s+1)*(s-V)'); R2_Pillai_p = 1-fcdf(R2_Pillai_F,(s*(2*m+s+1)),(s*(2*N+s+1))); % what about the effect of 1 regressor in particular? (partial coef) % let's look at the 2nd regressor C = diag([0 1 0]); C0 = eye(size(X,2)) - C*pinv(C); X0 = X*C0; R0 = eye(size(Y,1)) - (X0*pinv(X0)); M = R0 - R; H = B'*X'*M*X*B; eigen_values = eig(inv(E)*H); df_multivariate = size(Y,2); % nb of DV dfe_multivariate = size(Y,1)-size(Y,2)-(size(X,2)-1); % N-df-nb of covariates % Roy's test theta = max(eigen_values) / (1+max(eigen_values)); % Pillai V = sum(eigen_values ./ (1+eigen_values)); % Lawley-Hotelling's generalized T2 U_continuous = sum(eigen_values); F_continuous= (dfe_multivariate*max(eigen_values))/df_multivariate; pval = 1-fcdf(F_continuous, df_multivariate, dfe_multivariate);