Re: reverse engineering principal components...
- To: mathgroup at smc.vnet.net
- Subject: [mg128421] Re: reverse engineering principal components...
- From: Ray Koopman <koopman at sfu.ca>
- Date: Sun, 14 Oct 2012 23:41:21 -0400 (EDT)
- Delivered-to: l-mathgroup@mail-archive0.wolfram.com
- Delivered-to: l-mathgroup@wolfram.com
- Delivered-to: mathgroup-newout@smc.vnet.net
- Delivered-to: mathgroup-newsend@smc.vnet.net
- References: <k5854v$io4$1@smc.vnet.net>
On Oct 11, 9:10 pm, Richard Palmer <rhpal... at gmail.com> wrote: > I would like to be able to take a large dataset, compute the principal > components on a sufficient subset, and use the results to compute principal > components on the remaining observations. So far, I haven't been able to > figure out how it is done. Here is sample code (computed as a notebook > expression). Can anyone tell me where I am going wrong? > > Notebook[{ > > Cell[CellGroupData[{ > Cell["Reverse Engineering Principal Components", "Section", > CellChangeTimes->{{3.558966707926651*^9, 3.5589667244925985`*^9}}], > > Cell["\<\ > make a table of data and a table of the principal components using \ > the Correllation method. Check to see that they have the requisite \ > properties\ > \>", "Text", > CellChangeTimes->{{3.5589667304749403`*^9, 3.558966775500516*^9}, { > 3.558966830268648*^9, 3.5589668439964333`*^9}}], > > Cell[CellGroupData[{ > > Cell[BoxData[{ > RowBox[{ > RowBox[{"t", "=", > RowBox[{"Table", "[", > RowBox[{ > RowBox[{"RandomReal", "[", "]"}], ",", > RowBox[{"{", "5", "}"}], ",", > RowBox[{"{", "3", "}"}]}], "]"}]}], ";"}], "\n", > RowBox[{ > RowBox[{ > RowBox[{"princomponentst", "=", > RowBox[{"PrincipalComponents", "[", > RowBox[{"t", ",", > RowBox[{"Method", "\[Rule]", "\"\<Correlation\>\""}]}], "]"}]}], > ";"}], " "}], "\n", > RowBox[{"Print", "[", > RowBox[{"\"\<The mean of the set is \>\"", ",", > RowBox[{ > RowBox[{"Mean", "[", "princt", "]"}], "//", "Chop"}]}], > "]"}], "\n", > RowBox[{"Print", "[", > RowBox[{"\"\<The variance of the set is \>\"", ",", > RowBox[{"Variance", "[", "princt", "]"}]}], "]"}]}], "Input", > CellChangeTimes->{{3.558943696585477*^9, 3.5589437436731706`*^9}, { > 3.5589448723167253`*^9, 3.558944889147688*^9}, > 3.5589452714125524`*^9, {3.558965740525318*^9, > 3.558965743957515*^9}, 3.558966338325511*^9, { > 3.5589667822939043`*^9, 3.558966817373911*^9}, { > 3.558966862013464*^9, 3.5589669321494756`*^9}, { > 3.558967927711418*^9, 3.558967942303253*^9}}], > > Cell[CellGroupData[{ > > Cell[BoxData[ > InterpretationBox[ > RowBox[{"\<\"The mean of the set is \"\>", "\[InvisibleSpace]", > RowBox[{"{", > RowBox[{"0", ",", "0", ",", "0"}], "}"}]}], > SequenceForm["The mean of the set is ", {0, 0, 0}], > Editable->False]], "Print", > CellChangeTimes->{{3.558966925069071*^9, 3.558966932823514*^9}, { > 3.558967933532751*^9, 3.5589679478705716`*^9}}], > > Cell[BoxData[ > InterpretationBox[ > RowBox[{"\<\"The variance of the set is \"\>", "\[InvisibleSpace]", > RowBox[{"{", > RowBox[{ > "1.4974734615741159`", ",", "0.9657686960146733`", ",", > "0.5367578424112112`"}], "}"}]}], > SequenceForm[ > "The variance of the set is ", {1.4974734615741159`, > 0.9657686960146733, 0.5367578424112112}], > Editable->False]], "Print", > CellChangeTimes->{{3.558966925069071*^9, 3.558966932823514*^9}, { > 3.558967933532751*^9, 3.558967947872572*^9}}] > > }, Open ]] > }, Open ]], > > Cell["\<\ > Standardize the observations and compute a correlation matrix. \ > Compute the eigenvectors.\ > \>", "Text", > CellChangeTimes->{{3.558966974924922*^9, 3.558967006484727*^9}}], > > Cell[BoxData[{ > RowBox[{ > RowBox[{"standardizet", "=", > RowBox[{"Standardize", "[", "t", "]"}]}], ";"}], "\n", > RowBox[{ > RowBox[{ > RowBox[{"corrt", "=", > RowBox[{"Correlation", "[", "standardizet", "]"}]}], ";"}], > " "}], "\n", > RowBox[{ > RowBox[{ > RowBox[{"eigenvectors", "=", > RowBox[{"Eigenvectors", "[", "corrt", "]"}]}], ";"}], > " "}]}], "Input", > CellChangeTimes->{{3.5589449260758*^9, 3.5589449510202265`*^9}, { > 3.5589454403162127`*^9, 3.5589454525239115`*^9}, > 3.5589670144291816`*^9, 3.5589670498292065`*^9, { > 3.5589711280694685`*^9, 3.5589711551900196`*^9}}], > > Cell["\<\ > I think this is the multiplication. However, the variances are not \ > correct since they do not decrease.\ > \>", "Text", > CellChangeTimes->{{3.5589677050376825`*^9, 3.5589677222526665`*^9}, { > 3.558971033573064*^9, 3.558971044228673*^9}, {3.558971199412549*^9, > 3.558971208188051*^9}}], > > Cell[CellGroupData[{ > > Cell[BoxData[{ > RowBox[{ > RowBox[{"mypc2", "=", > RowBox[{"standardizet", ".", "eigenvectors"}]}], ";"}], "\n", > RowBox[{"Mean", "[", "mypc2", "]"}], "\n", > RowBox[{"Variance", "[", "mypc2", "]"}]}], "Input", > CellChangeTimes->{{3.5589661581992083`*^9, 3.5589662056929245`*^9}, > 3.5589670948777833`*^9, 3.5589671263245816`*^9, { > 3.558967191557313*^9, 3.558967192101344*^9}, { > 3.5589677375095396`*^9, 3.558967776636778*^9}, > 3.5589710607416177`*^9}], > > Cell[BoxData[ > RowBox[{"{", > RowBox[{ > RowBox[{"-", "2.4424906541753446`*^-16"}], ",", > "3.108624468950438`*^-16", ",", > RowBox[{"-", "3.7192471324942745`*^-16"}]}], "}"}]], "Output", > CellChangeTimes->{{3.55897113613293*^9, 3.5589711629244623`*^9}}], > > Cell[BoxData[ > RowBox[{"{", > RowBox[{ > "1.1977733239835728`", ",", "0.7727628961600694`", ",", > "1.0294637798563568`"}], "}"}]], "Output", > CellChangeTimes->{{3.55897113613293*^9, 3.558971162927462*^9}}]}, Open ]] > }, Open ]] > }, > > WindowSize->{707, 787}, > WindowMargins->{{Automatic, 228}, {49, Automatic}}, > ShowSelection->True, > FrontEndVersion->"8.0 for Microsoft Windows (64-bit) (October 6, \ > 2011)", > StyleDefinitions->"Default.nb" > ] > > -- > Richard Palmer > > Home 941 412 8828 > Cell 508 982-7266 Eigenvectors returns a matrix in which each row is an eigenvector, so you need to transpose the matrix that Eigenvectors returns.