Re: FinancialData and disjoint result sets from range and

*To*: mathgroup at smc.vnet.net*Subject*: [mg99844] Re: FinancialData and disjoint result sets from range and*From*: N <enoman.nr at gmail.com>*Date*: Sat, 16 May 2009 05:19:25 -0400 (EDT)*References*: <guj86m$hqb$1@smc.vnet.net>

Thanks, Bob. But that's not it. First, list2 doesn't provide a sufficient set to offer a counter-example of your hypothesis: the high for the month _did_ occur on the first trading day. (Nor is it sufficient for discussing the specific mystery you mention concerning 1Feb. See at end.) Second, your hypothesis > "Month" list1 appears to be the High for the first > trading day of each month ... rather than the high > during each month. doesn't match the actual behavior. What one sees is that in each case of these non-day periods (Month, Year) for queries of this sort of data, the DATE reported with the result is the earliest Mod(Period) relevant day. Compare the DATES for these: FinancialData["AAPL", "High", {{2005}, {2006}, "Month"}] FinancialData["AAPL", "High", {{2005}, {2006}, "Year"}] ("Relevant" is important here ... for "Return", which works close-to-close, the first relevant date would be the second close of trading which the operator works against. Now, I'm not saying this is a reasonable behavior ... revealing the mechanics of the operation to the semantics of the operator, and in the process nullifying consistent use of the date range. In fact I think it unreasonable and clunky.) As a counter-example to this Mod(Period) theory, consider "Dividend" with "Month" or "Year" ... which does provide a date other than the Mod(Period) and consistently. And necessarily, btw, otherwise you couldn't use the results to reconstruct events. This is not a necessary condition for "High" or "Return". In any case, this date issue is an ancillary confusion. I agree, it is a confusion, one that should be more- clearly documented. And fixed. Actually, it's a bit more than just confusing, since it limits the methods for processing such data. That is, if you want the ACTUAL (first) date for the months high, you cannot use the result, but must process the day results. That is in part why getting different VALUES is troublesome (esp, say, if you might look for more than one case of that HIGH in a month or year ... for that you need consistency and comparability in the value). The real issue in my question is the (apparently) disjoint sets of source data VALUEs. Compare "Raw" to "RawHigh": FinancialData["GE", "RawHigh", {{2005}, {2006}, "Month"}] FinancialData["GE", "RawHigh", {{2005, 1, 1}, {2005, 2, 1}}] Now, fine, these produce a set of values different than "High"; that's expected -- it is a different (unadjusted) source set. The two "Raw" operators appear to work from the same source set: the VALUE of the highs match. So maybe the challenge with "High" concerns the adjustments. But here's the thing: 1. One can't explain the difference I report simply from High vs RawHigh ... as "Dividend" (correctly) doesn't report any for that period, there was no split, and to me it's mysterious what "related changes" could effect that. From the ref/ > "For historical data, properties such as "High", > "Low", "Close" are adjusted for stock splits, > dividends and related changes." 2. But even a cause found in (1.) would still be troublesome. Both queries are (apparently) for the same range of trading days, one is grouped by day, the other month. Wouldn't we expect the same adjustment to apply? So, does the difference come from a difference in source sets (why?), or in normalization of some sort (what?)? Or some other cause, including operator error? ... Your specific observation about the Feb 1, 2005, results concerns the vagaries of the current design of period operators, not the difference of the results. When using "Month" or "Year" you'll end up with with a result for the period using as many dates as selected by the date range. Here, 1 day. (The same effect occurs from the starting date too: try starting {2004,12,20} vs {2004,12,1}. The actual high is on the 14th.) Again, I think that's not very clear ... and not very sensible, too: when one switches to periods, aggregates or clusters, the semantics of time (ranges or otherwise) also changes. Or should change. But they don't in this case. But indeed you have found another case of the error I asked about. > FinancialData["GE", "High", {{2005}, {2006}, "Month"}] > ... {{2005, 2, 1}, 31.160102272727272}, ... Compare: FinancialData["GE", "High", {{2005, 2, 1}, {2005, 3, 1}}] ... {{2005, 2, 15}, 30.9661}, ... If one compares to other datasets, adjusting for dividends and splits, this 30.9661 is a reasonable, close-enough, adjusted high. It is on the right date by any account. I cannot account for that 31.16. >From this view, then, we extend my question (Why different results for the (apparently) same query from the same time frame?) with the related question: What difference in operation would produce 31.16, which is a comparatively large difference/error? Could one (weird) explanation for both could be that the starting basis is different for different "period" queries? E.g.: Year uses a different starting basis than Month, and both are different than day, the default? Thanks, --N On May 15, 1:13 am, Bob Hanlon <hanl... at cox.net> wrote: > list1 = FinancialData["GE", "High", {{2005}, {2006}, "Month"}] > > {{{2005, 1, 3}, 31.20283420979795}, > {{2005, 2, 1}, 31.160102272727272}, > {{2005, 3, 1}, 31.05757071547421}, ... > > list1 appears to be the High for the first trading day > of each month in 2005 rather than the high during each month. > > list2 = FinancialData["GE", "High", {{2005, 1, 1}, {2005, 2, 1}}] > > {{{2005, 1, 3}, 31.203757857338065}, > {{2005, 1, 4}, 31.172143845089906}, > {{2005, 1, 5}, 30.74689674366825}, > {{2005, 1, 6}, 30.80918829376036}, ... > Monday {2005, 1, 17} was a US National holiday (MLK Birthday) > > list2 appears to be the Highs for each trading day in Jan > and the first trading day of Feb. > > The real oddity is that there are different values > for {2005, 2, 1} > > list1[[2]] > {{2005,2,1},31.1601} > > list2 // Last > {{2005,2,1},30.7746} > > Bob Hanlon > > ---- N <enoman... at gmail.com> wrote: > > ============= > > I have a puzzle, probably easily explained; apparently so > easily that it hasn't even appeared here. > > From examples in the Mathematica ref/FinancialData: > > Give the monthly highs for a stock price over a range of dates: > > FinancialData["GE", "High", {{2005}, {2006}, "Month"}] > > High is defined as: > High: Highest price during the trading day > > My guess, then, is that the source set is all intraday prices > for from start {2005,1,1} until end {2006,1,1}. > > Consequentially I expect that there will be an intersection > for January between that result and this result: > > FinancialData["GE", "High", {{2005, 1, 1}, {2005, 2, 1}}] > > There is not. The maximum for the date range: > > Map[#[[2]] &, > FinancialData["GE", "High", {{2005, 1, 1}, {2005, 2, 1}}]] // Max > > = 31.2038 > > While from the earlier "Month" query we have > > = {{2005, 1, 3}, 31.2028} > > If one considers that the date-range version might have a superset > of times, and the high occured in that interregnum, then comparing > March's result will disappoint: the opposite occurs (the high appears > in the "Month" result). > > If one wishes to explain this by precision effect during calculation, > although I find that dubious, consider that FoldList[] on "Return" > will produce what "CumulativeReturn" delivers. > > So, what's the simple explanation? > > Thanks, > --N > > =============