Compare matrices using Chi square for Exploratory Factor Analysis
- To: mathgroup at smc.vnet.net
- Subject: [mg114849] Compare matrices using Chi square for Exploratory Factor Analysis
- From: Stuart Nettleton <Stuart.Nettleton at uts.edu.au>
- Date: Mon, 20 Dec 2010 00:39:56 -0500 (EST)
Hi, I am hoping that someone might be able to suggest a straightforward way of comparing two matrices using chi-squared. Below is a package I prepared to carry out exploratory factor analysis in both horizontal and traditional vertical data matrices. The last task is to test the hypothesis that the number of factors used is sufficient using a chi square statistic for the relevant degrees of freedom to calculate a p-value. This test would compare the data matrix Z to the recreated data matrix fs.Transpose[Lambda], where fs is the factor score matrix and Lambda is the factor loadings matrix. For example, "Test of the hypothesis that 7 factors are sufficient. The chi square statistic is 39.26 on 59 degrees of freedom. The p-value is 0.978." Any thoughts would be very much appreciated. Cheers, Stuart BeginPackage["zigzagefa`"] efa::usage = "efa[data, factors, Method\[Rule]{Triangular(Default), Varimax, None}] is based on Unkel & Trendafilov's GEFALS (2010) for n>=p+k & Zig Zag EFA (2010) for n<p+k. Data is automatically corrected by setting non-numeric values to the average for the variable. efa returns {\[CapitalLambda], \[CapitalPsi], factor scores, final Frobenius Norm, maximum iterations}. The EFA algorithm has been adjusted to set the factor maximums in the columns of \[CapitalLambda] to be positive at each iteration." factorreports::usage="factorreports[\[CapitalLambda], \[CapitalPsi]\[Rule]\[CapitalPsi]residuals (Default NA), fs\[Rule]factorscores (Default none), vars\[Rule]variable names (Default is generic), varlength\[Rule]print length of variables (Default is 8), desc\[Rule]description (Default none)] prints the suite of reports with optional prefix description (desc)." varimaxTP::usage="varimaxTP[\[CapitalLambda], NormalizeVars\[Rule]{True, False(Default)}] performs an optimal varimax rotation. It operates on the factor loading matrix \[CapitalLambda] and returns \[CapitalLambda]vm. VarimaxTP uses the standard algorithm Kaiser(Psychometrika 1958) modified by Trevor Park in April 2002 from an original file by J.O.Ramsay. See http://www.stat.ufl.edu/~tpark/Research/varimaxTP.m" fixmissing::usage="fixmissing[data] sets any non-numeric data to the variable average. It is automatically used with zzefa" Begin["`Private`"] (*If[TrueQ,[$VersionNumber>=8.0]==False,Needs["MultivariateStatistics`"]];*) efa[data_,factors_,OptionsPattern[{Method->"Triangular"}]]:=Module[{fixeddata,Z,n,p,k,B,RB,idx,F,U,\[CapitalLambda],\[CapitalPsi],f,f0,maxiter,m,U0, S0,V0,Q,uppert,Fbot,y1,yd,y2,=C3=9B,uW,dW,vW,\[CapitalLambda]negcolnum,temp2,fs}, (*ZIGZAG EFA*) (*Unkel S. & N.T. Trendfilov 2010 Zigzag exploratory factor analysis with more variables than observations*) (*Z is mean centred, normalised to unit & variance adjusted*) fixeddata=fixmissing[data]; Z=Transpose[Map[Normalize,Transpose[Map[#-Mean[fixeddata]&,fixeddata]]]]; {n,p}=Dimensions[Z]; k=factors; (*number of factors chosen exogenously*) (*Initializing common factor scores F and unique factor scores U*) If[n<p+k, (*This is the original route of the Zig Zag paper with more variables than observations*) {B,RB}=QRDecomposition[RandomReal[1,{p+k,n}]-0.5], (*Use this for n>=p+k*) {B,RB}=QRDecomposition[RandomReal[1,{n,n}]-0.5]; B=Take[B,All,p+k]; RB=Take[RB,p+k,p+k] ]; If[Min[Diagonal[RB]]<0, idx=Pick[Range[Length[RB]],Thread[Diagonal[RB]<0]]; B[[idx,All]]=-B[[idx,All]] ]; F=Take[B,All,k]; U=Take[B,All,-p]; (*Initializing loadings \[CapitalLambda] and uniquenesses Psi*) Which[OptionValue[Method]=="Triangular",\[CapitalLambda]=LowerTriangularize[Transpose[Z].F], OptionValue[Method]=="Varimax",\[CapitalLambda]=varimaxTP[Transpose[Z].F], OptionValue[Method]=="None",\[CapitalLambda]=varimaxTP[Transpose[Z].F], True,Print["Error in Method Specification"];Abort[] ]; \[CapitalPsi]=DiagonalMatrix[Diagonal[Transpose[U].Z]]; f=Norm[Z-F.Transpose[\[CapitalLambda]]-U.\[CapitalPsi],"Frobenius"]^2;(*%Objective function*) f0=Norm[Z,"Frobenius"]^2; (*Main loop for ALS algorithm*) maxiter=0; While[Abs[f0-f]>10^-6 ,(*% Convergence criterion*) (*Solve Procrustes problem to Method F*) If[n<p+k, (*This is the original route of the Zig Zag paper with more variables than observations*) m=(Z-U.\[CapitalPsi]).\[CapitalLambda]; {U0,S0,V0}=SingularValueDecomposition[m,Length[SingularValueList[m]]]; F=U0.Transpose[V0]; (*Solve reduced Procrustes-like problem for =C3=9B,such that*) (*=C3=9B.Transpose[=C3=9B]=IdentityMatrix[n-k] and hence F.Transpose[F]+U.Transpose[U]=IdentityMatrix[n]*) {Q,uppert}=QRDecomposition[Map[PadRight[#,n]&,F]]; Q=Transpose[Q];(*To create Q*) Fbot=Take[Q,All,{k+1,n}]; m=\[CapitalPsi].Transpose[Z].Fbot; {y1,yd,y2}=SingularValueDecomposition[m,Length[SingularValueList[m]]]; =C3=9B=y2.Transpose[y1]; (*Method U*) U=Fbot.=C3=9B, (*From GEFALS for n>=p+k*) m=Z.Join[\[CapitalLambda],\[CapitalPsi],2]; {uW,dW,vW}=SingularValueDecomposition[m,Length[SingularValueList[m]]]; B=uW.Transpose[vW]; F=Take[B,All,k]; U=Take[B,All,{k+1,k+p}] ]; (**) (*Method \[CapitalLambda] and Psi*) Which[OptionValue[Method]=="Triangular",\[CapitalLambda]=LowerTriangularize[Transpose[Z].F], OptionValue[Method]=="Varimax",\[CapitalLambda]=varimaxTP[Transpose[Z].F], OptionValue[Method]=="None",\[CapitalLambda]=Transpose[Z].F, True,Print["Error in Method Specification"];Abort[] ]; (*Set the maximum value in each column to be positive and correspondingly adjust F*) \[CapitalLambda]negcolnum=Position[Map[Sign,Map[\[CapitalLambda][[Ordering[Transpose[Abs[\[CapitalLambda]]][[#]],-1][[1]],#]]&,Range[k]]],-1]; If[Length[\[CapitalLambda]negcolnum]>0, \[CapitalLambda]=Transpose[MapAt[Minus,Transpose[\[CapitalLambda]],\[CapitalLambda]negcolnum]]; F=Transpose[MapAt[Minus,Transpose[F],\[CapitalLambda]negcolnum]] ]; \[CapitalPsi]=DiagonalMatrix[Diagonal[Transpose[U].Z]]; (*Evaluating the objective function*) f0=f; f=Norm[Z-F.Transpose[\[CapitalLambda]]-U.\[CapitalPsi],"Frobenius"]^2; If[Mod[maxiter,100]==0,NotebookDelete[temp2];temp2=PrintTemporary["Iteration "<>ToString[maxiter]]]; maxiter++ (*End of main loop*) ]; (*factor scores*) fs=Z.\[CapitalLambda]; Return[{Z,\[CapitalLambda],Diagonal[\[CapitalPsi]],fs,f,maxiter}] ]; fixmissing[data_]:=Module[{means,missing}, means=Map[Mean[Select[Transpose[data][[#]],NumberQ]]&,Range[Dimensions[data][[2]]]]; missing=Position[Map[Map[NumericQ,#]&,data],False]; Return[ReplacePart[data,Thread[missing->means[[missing[[All,2]]]]]]] ]; tableout[data_,tablename_,rownames_,colnames_,OptionsPattern[{rounding->0.001,units->""}]]:=Module[{upper,lower}, If[Length[Dimensions[data]]==1, upper={{Text[tablename],SpanFromLeft},{Text[OptionValue[units]],colnames}}; lower=Transpose[{rownames,If[VectorQ[data,NumberQ],Round[data,OptionValue[rounding]],data]}], upper={Join[{Text[tablename]},Table[SpanFromLeft,{j,Dimensions[data][[2]]}]],Join[{Text[OptionValue[units]]},colnames]}; lower=Transpose[Prepend[Map[If[VectorQ[#,NumberQ],Round[#,OptionValue[rounding]],#]&,Transpose[data]],rownames]] ]; Print[Grid[Join[upper,lower],Background->{None,{Lighter[Yellow,.9],{White,Lighter[Blend[{Blue,Green}],.8]}}},Dividers->{{Darker[Gray,.6],{Lighter[Gray,.5]},Darker[Gray,.6]},{Darker[Gray,.6],Darker[Gray,.6],{False},Darker[Gray,.6]}},Alignment->{{Left,Right,{Left}}},Frame->Darker[Gray,.6],ItemStyle->14,Spacings->{Automatic,.8}]] ]; factorreports[\[CapitalLambda]_,OptionsPattern[{\[CapitalPsi]->{},fs->{},vars->{},varslength->8,desc->""}]]:=Module[{\[CapitalPsi]2,\[CapitalLambda]1,\[CapitalLambda]2,\[CapitalLambda]3,\[CapitalLambda]4,\[CapitalLambda]5,p,k,facnames,varnames,signames,rinames,obsnames}, {p,k}=Dimensions[\[CapitalLambda]]; facnames=Table["Fac"<>ToString[i],{i,k}]; varnames=Table["Var"<>ToString[i],{i,p}]; If[Length[OptionValue[vars]]>0, If[Length[OptionValue[vars]]!=p, Print["Warning, list of variable names different to the number of variables, continuing with generic variable names ...."], varnames=Map[StringTake[ToString[OptionValue[vars][[#]]],Min[StringLength[ToString[OptionValue[vars][[#]]]],OptionValue[varslength]]]&,Range[Length[OptionValue[vars]]]] (*varnames=varnames/.Map["Var"<>ToString[#]->StringTake[ToString[OptionValue[vars][[#]]],Min[StringLength[ToString[OptionValue[vars][[#]]]],OptionValue[varslength]]]&,Range[Length[OptionValue[vars]]]]*) ]]; signames=Table["Sig"<>ToString[i],{i,k}]; rinames=Table["RI"<>ToString[i],{i,k}]; If[Length[OptionValue[\[CapitalPsi]]]==0,\[CapitalPsi]2=Table["NA",{i,p}],\[CapitalPsi]2=OptionValue[\[CapitalPsi]]^2]; Print[tableout[Join[MapThread[Append,{\[CapitalLambda],\[CapitalPsi]2}],Map[Cases[#,a_->Which[Abs[a]>=0.9,">.9",Abs[a]>=0.7,">.7",Abs[a]>0.6,">.6",Abs[a]>=0.4,">.4",Abs[a]>=0.25,">.25",True,""]]&,\[CapitalLambda]],2],OptionValue[desc]<>": Factor Loadings (Fac), Residuals (\[CapitalPsi]^2) & Significance",varnames,Flatten[{facnames,{"\[CapitalPsi]^2"},signames}],rounding->0.01,units->"R"]]; Print[tableout[Correlation[\[CapitalLambda]],OptionValue[desc]<>": Correlation of Factors",facnames,facnames,units->"R"]]; Print[tableout[Covariance[\[CapitalLambda]],OptionValue[desc]<>": Covariance of Factors",facnames,facnames,units->"Cov"]]; \[CapitalLambda]1=\[CapitalLambda]^2; \[CapitalLambda]2=Total[\[CapitalLambda]1]/p; \[CapitalLambda]3=Join[\[CapitalLambda]1,{\[CapitalLambda]2},{Total[\[CapitalLambda]1]/Total[\[CapitalLambda]1,2]}]; \[CapitalLambda]4=MapThread[Append,{\[CapitalLambda]3,Total[\[CapitalLambda]3,{2}]}]; \[CapitalLambda]5=Join[\[CapitalLambda]4,Map[\[CapitalLambda]3[[#]]/Total[\[CapitalLambda]3,{2}][[#]]&,Range[Length[\[CapitalLambda]3]]],2]; Print[tableout[\[CapitalLambda]5,OptionValue[desc]<>": Factor LoadingsSquares: Variance in each variable explained by factor & Relative Importance",Flatten[{varnames,{"Total","Ratio"}}],Flatten[{facnames,{"Total(Communality)"},rinames}]]]; Print[tableout[Transpose[{\[CapitalLambda]2,Rest[FoldList[Plus,0,\[CapitalLambda]2]]}],OptionValue[desc]<>":Factor Sum of the Squares & Cumulative",facnames,{"Propn of Variance","Cumulative"}]]; If[Length[OptionValue[fs]]>0, obsnames=Table["Obs"<>ToString[i],{i,Dimensions[OptionValue[fs]][[1]]}]; Print[tableout[OptionValue[fs],OptionValue[desc]<>": Factor Scores",obsnames,facnames]] ]; (*Ratio of factor eigenvalues - explanatory importance of the factors with respect to the total sample*) (*{\[CapitalLambda]u,\[CapitalLambda]d,\[CapitalLambda]v}=SingularValueDecomposition[\[CapitalLambda],k]; tableout[Sqrt[Diagonal[\[CapitalLambda]d]]/Total[\[CapitalLambda]d,2],0.001,"Ratio of factor eigenvalues",facnames,"Proportion",""]*) (*Plot of first 3 factors*) Print[If[k>=3,ListPlot3D[Take[\[CapitalLambda],All,3],PlotLabel->OptionValue[desc]<>": Plot of First 3 Factors for all Variables"], If[k==2,ListPlot[Take[\[CapitalLambda],All,2],PlotLabel->OptionValue[desc]<>": Plot of Two Factors for all Variables"]]]] ]; varimaxTP[\[CapitalLambda]_,OptionsPattern[{NormalizeVars->False}]]:=Module[{iter,itermax,\[Epsilon],amat,amatd,p,k,targetbasis,rotm,varnow,notconverged,phimax,subrot,varold,optamat,fs}, (*varimaxTP(component loadings matrix) gives an unnormalized Varimax optimal rotation of matrix AMAT using the standard algorithm Kaiser(Psychometrika 1958) modified by Trevor Park in April 2002 from an original file by J.O.Ramsay. See http://www.stat.ufl.edu/~tpark/Research/varimaxTP.m*) itermax=100;(*Maximum number of complete passes through column pairs*) \[Epsilon]=10^-7;(*Stopping tolerance based on relative reduction in varimax*) If[Length[Dimensions[\[CapitalLambda]]]!=2, Print["Error: AMAT must be two-dimensional"];Return[\[CapitalLambda]]]; {p,k}=Dimensions[\[CapitalLambda]]; If[OptionValue[NormalizeVars], amat=Map[Normalize,\[CapitalLambda]],(*Normalize the variables (rows) of \[CapitalLambda]*) amat=\[CapitalLambda]] ; rotm=IdentityMatrix[k]; targetbasis=IdentityMatrix[p]; varnow=Total[Variance[amat^2]]; notconverged=True; iter=0; While[notconverged && iter<=itermax, For[j=1,j<=(k-1),j++, For[l=(j+1),l<=k,l++, (*Calculate optimal 2-D planar rotation angle for columns j,l*) phimax=Arg[p Total[Thread[Complex[amat[[All,j]],amat[[All,l]]]]^4]-Total[Thread[Complex[amat[[All,j]],amat[[All,l]]]]^2]^2]/4; subrot={{Cos[phimax],-Sin[phimax]},{Sin[phimax],Cos[phimax]}};(*orthonormal matrix*) amat[[All,{j,l}]]=amat[[All,{j,l}]].subrot; rotm[[All,{j,l}]]=rotm[[All,{j,l}]].subrot; ]; varold=varnow; varnow=Total[Variance[amat^2]]; If[varnow==0,Break[]]; notconverged=(varnow-varold)/varnow>\[Epsilon]; iter++ ]; If[iter>=itermax,Print["Maximum number of iterations reached in function!"]]; optamat=targetbasis.amat; Return[optamat] ]; ]; End[] EndPackage[] UTS CRICOS Provider Code: 00099F DISCLAIMER: This email message and any accompanying attachments may contain confidential information. If you are not the intended recipient, do not read, use, disseminate, distribute or copy this message or attachments. If you have received this message in error, please notify the sender immediately and delete this message. Any views expressed in this message are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of the University of Technology Sydney. Before opening any attachments, please check them for viruses and defects. Think. Green. Do. Please consider the environment before printing this email.
- Follow-Ups:
- Re: Compare matrices using Chi square for Exploratory
- From: Stuart Nettleton <Stuart.Nettleton@uts.edu.au>
- Re: Compare matrices using Chi square for Exploratory
- From: Stuart Nettleton <Stuart.Nettleton@uts.edu.au>
- Re: Compare matrices using Chi square for Exploratory
- From: Stuart Nettleton <Stuart.Nettleton@uts.edu.au>
- Re: Compare matrices using Chi square for Exploratory