Re: Regression with missing values

*To*: mathgroup at smc.vnet.net*Subject*: [mg51738] Re: Regression with missing values*From*: Bill Rowe <readnewsciv at earthlink.net>*Date*: Sun, 31 Oct 2004 01:16:47 -0500 (EST)*Sender*: owner-wri-mathgroup at wolfram.com

On 10/29/04 at 3:40 AM, ludenscheid1 at hotmail.com (Nikolas Kiefer) wrote: >Actually, I didn't want to delete those entries. To be a little >bit more explicit about my data, I have circa 5 independent >variables which place the 6th dependent variable (the analysis >result) into a category. Unfortunately, sometimes there exist data >for which 1 of the 5 independent variables cannot be determined, >and thus the data really can't be placed into a category. It's at >this point that I was hoping it would be possible to still make use >of the 6th data variable (which depends on the first 5 vars) by >somehow "estimating" the category to which it should belong to. >I know that certain statistical packages deal with such >circumstances without simply deleting the data, but can >Mathematica? The statistics packages I know of that automatically deal with missing data do so by omitting that data point from the computations to be done. That is equivalent to deleting the data point as I suggested. None of the statistical packages that come with the standard Mathematica distribution have procedures to automatically delete or estimate missing data points. It is possible to do either within Mathematica, but you will have to write the code needed. The problem you want to solve is quite difficult. Since it is the value of one of the independent variables that is missing, you no real basis for estimating the value. That is, the values of the other independent variables give you no information about the missing value, else they would not be independent. The only thing you can do would be to look for some simple pattern in the values for the variable with the missing value. For example, if the variable had values {1,2,3,4,missing,6,7} you could reasonably assume the missing value was 5. But if a small change in this value changed the category for the 6th variable you mention above, I strongly suspect you would be much better off deleting the data point than trying to estimate the correct category. -- To reply via email subtract one hundred and four