MathGroup Archive 2004

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Regression with missing values

On 10/29/04 at 3:40 AM, ludenscheid1 at (Nikolas Kiefer)

>Actually, I didn't want to delete those entries.  To be a little
>bit more explicit about my data, I have circa 5 independent
>variables which place the 6th dependent variable (the analysis
>result) into a category.  Unfortunately, sometimes there exist data
>for which 1 of the 5 independent variables cannot be determined,
>and thus the data really can't be placed into a category.  It's at
>this point that I was hoping it would be possible to still make use
>of the 6th data variable (which depends on the first 5 vars) by
>somehow "estimating" the category to which it should belong to.

>I know that certain statistical packages deal with such
>circumstances without simply deleting the data, but can

The statistics packages I know of that automatically deal with missing data do so by omitting that data point from the computations to be done. That is equivalent to deleting the data point as I suggested.

None of the statistical packages that come with the standard Mathematica distribution have procedures to automatically delete or estimate missing data points. It is possible to do either within Mathematica, but you will have to write the code needed.

The problem you want to solve is quite difficult. Since it is the value of one of the independent variables that is missing, you no real basis for estimating the value. That is, the values of the other independent variables give you no information about the missing value, else they would not be independent.

The only thing you can do would be to look for some simple pattern in the values for the variable with the missing value. For example, if the variable had values {1,2,3,4,missing,6,7} you could reasonably assume the missing value was 5. But if a small change in this value changed the category for the 6th variable you mention above, I strongly suspect you would be much better off deleting the data point than trying to estimate the correct category.
To reply via email subtract one hundred and four

  • Prev by Date: Re: Re: Re: Inverse of "PowerExpand"
  • Next by Date: Re: bimodal ditribution form counting signs of Pi digits differences
  • Previous by thread: Re: Regression with missing values
  • Next by thread: Memory Problems