MathGroup Archive: July 2011 [00397]

[Date Index] [Thread Index] [Author Index]

Re: MultinormalDistribution Question

To: mathgroup at smc.vnet.net
Subject: [mg120316] Re: MultinormalDistribution Question
From: Ray Koopman <koopman at sfu.ca>
Date: Tue, 19 Jul 2011 06:53:44 -0400 (EDT)
References: <201107100901.FAA24634@smc.vnet.net> <ivel76$89d$1@smc.vnet.net> <ivuc25$grq$1@smc.vnet.net>

On Jul 17, 3:03 am, Steve <s... at epix.net> wrote:
> [...]
> What I really need to do is perform this analysis on test data for
> which I have only a few data points, hence the Student T distribution
> would be more appropriate than the Normal distribution. Secondly,
> values for the "independent" and "dependent" variables have no
> physical meaning below zero. So this implies that I need truncated
> distributions. I'm hoping that the solution Andrzej  provided can be
> generalized for these added complications.
> Here are my 9 {F,t} data points where "F" is considered "independent"
> and t considered "dependent".
>
> {{1.01041, 0.3152}, {10.455, 0.3386}, {17.9032, 0.2534}, {24.9581,
>    0.5412}, {26.4688, 0.3251}, {27.4651, 0.4428}, {30.1682,
>    0.3402}, {36.6174, 0.2106}, {45.6129, 0.2154}}
>
> Would someone be so kind as to plop this data into their notebook to
> confirm a solution or two for me ? My results are below which are
> based on truncating the Student T distribution, 8 degrees of freedom
> and a calculated rho of -0.2327.
>
> [...]

I have several comments. First, the correlation of t with F is
so small that it is hard to justify treating it as nonzero. The
unbiased estimate of the conditional variance of t|F is bigger than
the unbiased estimate of the marginal variance of t. (The happens
whenever the F-statistic for testing the significance of the
correlation is < 1.)

In bivariate normal correlation, and in linear regression with
homoscedastic normal error, df = n-2, not n-1.

Regression models require only the conditional distribution of the
dependent variable given the independent variable. The independent
variable need not be random.

The fact that t can not be negative means that its conditional
distributions can not be normal. Is ordinary least squares fitting
justified? Yes, but only if conditional normality is abandoned. One
solution is to treat the conditional distributions as Gamma[a,b]
variables, where a is the shape constant and b is the scale constant.
Take a = m[F]^2/v and b = v/m[F]. Then the mean of each conditional
distribution will be m[F], the variance of each conditional
distribution will be v, and the Gauss-Markov theorem justifies
ordinary least-squares fitting.

Regardless of whether the conditional distributions are assumed
to be heteroscedastic truncated normal or homoscedastic gamma,
the sampling distribution of the estimates of the regression
coefficients and the conditional variance will not be the same
as in the usual homoscedastic normal case, and the usual Student-t
distributions can not be used to estimate quantiles of the conditional
distributions.

This may be a situation where one of John Tukey's antihubrisines
applies:
 "The data may not contain the answer. The combination of some
  data and an aching desire for an answer does not ensure that a
  reasonable answer can be extracted from a given body of data."

References:
- MultinormalDistribution Question
  - From: Steve <s123@epix.net>

Prev by Date: Re: Unexpected Behavior: SetDelayed versus Set

Next by Date: Re: Interpolation problems

Previous by thread: Re: MultinormalDistribution Question

Next by thread: Re: MultinormalDistribution Question