MathGroup Archive: May 2009 [00610]

[Date Index] [Thread Index] [Author Index]
Re: FinancialData and disjoint result sets from range and
To: mathgroup at smc.vnet.net
Subject: [mg99844] Re: FinancialData and disjoint result sets from range and
From: N <enoman.nr at gmail.com>
Date: Sat, 16 May 2009 05:19:25 -0400 (EDT)
References: <guj86m$hqb$1@smc.vnet.net>
Thanks, Bob. But that's not it.

First, list2 doesn't provide a sufficient set to offer a
counter-example of your hypothesis: the high for the month
_did_ occur on the first trading day.

(Nor is it sufficient for discussing the specific mystery
you mention concerning 1Feb. See at end.)

Second, your hypothesis

>  "Month" list1 appears to be the High for the first
>  trading day of each month ... rather than the high
>  during each month.

doesn't match the actual behavior. What one sees is that
in each case of these non-day periods (Month, Year) for
queries of this sort of data, the DATE reported with the
result is the earliest Mod(Period) relevant day. Compare
the DATES for these:

FinancialData["AAPL", "High", {{2005}, {2006}, "Month"}]
FinancialData["AAPL", "High", {{2005}, {2006}, "Year"}]

("Relevant" is important here ... for "Return", which
works close-to-close, the first relevant date would be
the second close of trading which the operator works
against. Now, I'm not saying this is a reasonable
behavior ... revealing the mechanics of the operation
to the semantics of the operator, and in the process
nullifying consistent use of the date range. In fact I
think it unreasonable and clunky.)

As a counter-example to this Mod(Period) theory, consider
"Dividend" with "Month" or "Year" ... which does provide
a date other than the Mod(Period) and consistently. And
necessarily, btw, otherwise you couldn't use the results
to reconstruct events. This is not a necessary condition
for "High" or "Return".

In any case, this date issue is an ancillary confusion.
I agree, it is a confusion, one that should be more-
clearly documented. And fixed.

Actually, it's a bit more than just confusing, since it
limits the methods for processing such data. That is, if
you want the ACTUAL (first) date for the months high, you
cannot use the result, but must process the day results.
That is in part why getting different VALUES is troublesome
(esp, say, if you might look for more than one case of
that HIGH in a month or year ... for that you need
consistency and comparability in the value).

The real issue in my question is the (apparently) disjoint
sets of source data VALUEs. Compare "Raw" to "RawHigh":

  FinancialData["GE", "RawHigh", {{2005}, {2006}, "Month"}]
  FinancialData["GE", "RawHigh", {{2005, 1, 1}, {2005, 2, 1}}]

Now, fine, these produce a set of values different than
"High"; that's expected -- it is a different (unadjusted)
source set. The two "Raw" operators appear to work from
the same source set: the VALUE of the highs match. So
maybe the challenge with "High" concerns the adjustments.

But here's the thing:

1. One can't explain the difference I report simply from
   High vs RawHigh ... as "Dividend" (correctly) doesn't
   report any for that period, there was no split, and
   to me it's mysterious what "related changes" could
   effect that. From the ref/
   >  "For historical data, properties such as "High",
   >  "Low", "Close" are adjusted for stock splits,
   >  dividends and related changes."

2. But even a cause found in (1.) would still be troublesome.
   Both queries are (apparently) for the same range of
   trading days, one is grouped by day, the other month.
   Wouldn't we expect the same adjustment to apply?

So, does the difference come from a difference in source
sets (why?), or in normalization of some sort (what?)?
Or some other cause, including operator error?

...

Your specific observation about the Feb 1, 2005, results
concerns the vagaries of the current design of period
operators, not the difference of the results. When using
"Month" or "Year" you'll end up with with a result for
the period using as many dates as selected by the date
range. Here, 1 day.

(The same effect occurs from the starting date too: try
starting {2004,12,20} vs {2004,12,1}. The actual high is
on the 14th.)

Again, I think that's not very clear ... and not very
sensible, too: when one switches to periods, aggregates
or clusters, the semantics of time (ranges or otherwise)
also changes. Or should change. But they don't in this
case.

But indeed you have found another case of the error I
asked about.

> FinancialData["GE", "High", {{2005}, {2006}, "Month"}]
> ... {{2005, 2, 1}, 31.160102272727272}, ...

Compare:

FinancialData["GE", "High", {{2005, 2, 1}, {2005, 3, 1}}]
  ... {{2005, 2, 15}, 30.9661}, ...

If one compares to other datasets, adjusting for dividends
and splits, this 30.9661 is a reasonable, close-enough,
adjusted high. It is on the right date by any account.
I cannot account for that 31.16.

>From this view, then, we extend my question (Why different
results for the (apparently) same query from the same time
frame?) with the related question: What difference in
operation would produce 31.16, which is a comparatively
large difference/error?

Could one (weird) explanation for both could be that the
starting basis is different for different "period"
queries? E.g.: Year uses a different starting basis than
Month, and both are different than day, the default?

Thanks,
--N

On May 15, 1:13 am, Bob Hanlon <hanl... at cox.net> wrote:

> list1 = FinancialData["GE", "High", {{2005}, {2006}, "Month"}]
>
> {{{2005, 1, 3}, 31.20283420979795},
>  {{2005, 2, 1}, 31.160102272727272},
>  {{2005, 3, 1}, 31.05757071547421},
...
>
> list1 appears to be the High for the first trading day
> of each month in 2005 rather than the high during each month.
>
> list2 = FinancialData["GE", "High", {{2005, 1, 1}, {2005, 2, 1}}]
>
> {{{2005, 1, 3}, 31.203757857338065},
>  {{2005, 1, 4}, 31.172143845089906},
>  {{2005, 1, 5}, 30.74689674366825},
>  {{2005, 1, 6}, 30.80918829376036},
...
> Monday {2005, 1, 17} was a US National holiday (MLK Birthday)
>
> list2 appears to be the Highs for each trading day in Jan
> and the first trading day of Feb.
>
> The real oddity is that there are different values
> for {2005, 2, 1}
>
> list1[[2]]
> {{2005,2,1},31.1601}
>
> list2 // Last
> {{2005,2,1},30.7746}
>
> Bob Hanlon
>
> ---- N <enoman... at gmail.com> wrote:
>
> =============
>
> I have a puzzle, probably easily explained; apparently so
> easily that it hasn't even appeared here.
>
> From examples in the Mathematica ref/FinancialData:
>
> Give the monthly highs for a stock price over a range of dates:
>
>   FinancialData["GE", "High", {{2005}, {2006}, "Month"}]
>
> High is defined as:
>   High: Highest price during the trading day
>
> My guess, then, is that the source set is all intraday prices
> for from start {2005,1,1} until end {2006,1,1}.
>
> Consequentially I expect that there will be an intersection
> for January between that result and this result:
>
>   FinancialData["GE", "High", {{2005, 1, 1}, {2005, 2, 1}}]
>
> There is not. The maximum for the date range:
>
>   Map[#[[2]] &,
>     FinancialData["GE", "High", {{2005, 1, 1}, {2005, 2, 1}}]] // Max
>
>   = 31.2038
>
> While from the earlier "Month" query we have
>
>   = {{2005, 1, 3}, 31.2028}
>
> If one considers that the date-range version might have a superset
> of times, and the high occured in that interregnum, then comparing
> March's result will disappoint: the opposite occurs (the high appears
> in the "Month" result).
>
> If one wishes to explain this by precision effect during calculation,
> although I find that dubious, consider that FoldList[] on "Return"
> will produce what "CumulativeReturn" delivers.
>
> So, what's the simple explanation?
>
> Thanks,
> --N
>
> =============
Prev by Date: Re: Re: Random choice
Next by Date: Problems in fitting the experimental data with the expression
Previous by thread: Re: FinancialData and disjoint result sets from range and
Next by thread: Exports to eps, pdf ImageSize