MathGroup Archive: June 2006 [00433]

[Date Index] [Thread Index] [Author Index]
Re: structure array equivalent in Mathematica
To: mathgroup at smc.vnet.net
Subject: [mg67239] Re: structure array equivalent in Mathematica
From: albert <awnl at arcor.de>
Date: Wed, 14 Jun 2006 06:28:46 -0400 (EDT)
References: <NDBBJGNHKLMPLILOIPPOAEHIFAAA.djmp@earthlink.net> <e6li4e$ni8$1@smc.vnet.net>
Sender: owner-wri-mathgroup at wolfram.com
Hi Kevin,

> Yes, this is definitely along the lines I was hoping for.  Of course,
> Part is the primary means of extracting elements of an array.
> However, I need a means of assigning names to elements of those
> lists.  I deal with reasonably large datasets where there may be many
> elements in a list and trying to remember that elements of list  78
> are quality flags is not very effective.
> 
> I guess the difference here is that the original data is still in one
> nested list with the pure functions there to extract the appropriate
> components whereas with a structure array the data itself is already
> organized in that fashion.
> 
> I will see where I can go with this approach.

If you are handling large datasets you should definitly use the unstructred
nested lists to store your data and work with other means (basically
defining 'access functions') to not need to remember the indices of certain
data types. I have included an answer I formulated yesterday but forgot to
post along the lines of the other answers, but maybe there is something
usefull within that. 

hth

albert

the following is from yesterday :-)

> Like many people I imagine, I'm transitioning to Mathematica from a
> background in another system.
> One of the common data types is the structure array.Â Â Let'sÂ sayÂ IÂ have
> an observational data set that includes pressure, temperature, and
> water vapor as a function of altitude.Â Â So,Â inÂ pseudo-codeÂ IÂ might
> define a structure as
> 
> observation = {pressure: float(100), temperature: float(100),
> water_vapor: float(100)}

no declaration needed in mathematica: you can put everything into a list at
any place. For your example I start of with some random data, of course you
will get your data from somewhere:

pressure1=Table[Random[],{100}]
temperature1=Table[Random[],{100}]
watervapor1=Table[Random[],{100}]

the usual way to store these values is to just put them in a list:

observation1 = {pressure1,temperature1,watervapour1}

> Furthermore, I could aggregate these observations into a larger list, e.g.
> obs_day = {observation, observation, observation}
> to be accessed as
> obs_day[1].pressure for the first element (assuming 1-index).

having done the above for three different observations you can collect them
into a single list:

observationdays = {observation1,observation2,observation3}
Â 
Of course usually you will construct this list in a different way by reading
a file or getting data from a database. Once the data is in this form you
can access the data like this:

observationdays[[1,1,55]] 
which is the short hand notation for
Part[observationdays,1,1,55]

for the 55th pressure of day one. 

> I could then access the elements of this observation as
> 
> observation.pressure
> observation.temperature, etc.
> 
> Now, the list in Mathematica is quite powerful and I think can be
> set-up in a similar fashion.
> 
> So my question is how is the structure array commonly implemented in
> Mathematica or its equivalent?

The above of course does not give you an obvious possibility to see which
part of the list has which meaning, there are no names for the list
entries. There is nothing like "structs" in mathematica as you are used to,
but there are plenty possibilities to make the data appear more structured
in mathematica of which you have to choose what's appropriate for your
problem. Here are some possibilities, there are a lot more, maybe something
else is better for your purposes, but that I can't say:

1) Define "names" for the pressure/temperature/watervapor parts of the data,
like:
pressure=1
temperature=2
watervapor=3
then you can use:
observationdays[[1,pressure,55]] which makes your code easier to read but
leaves the data as it was.

2) Define functions for accessing the data (again the data will stay in just
a big list of lists):
Clear[pressure,temperature,watervapor]
pressure[x_List]:=x[[1]]
temperature[x_List]:=x[[2]]
watervapor[x_List]:=x[[3]]

and use it like:
pressure[observationdays[[1]]]
or:
pressure@observationdays[[1]]

3) another widly used approach to organize data in mathematica is the use of
rules, like:
Clear[pressure,temperature,watervapor]
observation1={pressure->Table[Random[],{100}],temperature->Table[Random[]
{100}],watervapor->Table[Random[],{100}]}

which is probably closest to the "structs" you are trying to imitate. Then
you can access the pressures by applying this rule to pressure:

pressure /. observation1

of course you can define various versions of access-functions for this
construct to.

4) use downvalues instead of lists, like:

Clear[observationday]
observationday[1]=observation1
observationday[2]=observation2
...

then you can access the data with single brackets:

observationday[1]

Combine this with whatever seems appropriate from 1 to 3. This is often more
usefull than constructing a big list with many calls to Append or AppendTo.

Since I suppose you to work with rather large datasets you should note that
Mathematica will store and access arrays of just numbers which are all of
the same type (Reals here) much more efficiently (look up
Developer`ToPackedArray if you are interested) than any constructs which
contain rules (or anything else) so if you have large datasets I would
recommend to use the simple datastructure as in examples 1 and 2, make sure
they are transformed to PackedArrays (which Mathematica usually manages on
it's own). Then write "access"-functions which fit your need as well as
possible as explained above. It might be helpfull when writing bigger
programs to wrap a special header around the monstrous list which makes
clear what the data is to be interpreted like, e.g.:

odata = observationdata[{observation1,observation2,observation3}]

you can then use that header for checks of arguments in your
accessfunctions:

observationday[data_observationdata,day_,what_]:=data[[1,day,what/
{pressure->1,temperature->2,watervapor->3}]]
observationday[data_observationdata,day_,what_,index_]:=data[[1,day,what/
{pressure->1,temperature->2,watervapor->3},index]]

then access the data in odata with:
obeservationday[odata,1,pressure][[55]]
obeservationday[odata,1,pressure,55]

you can put in more information about the data and/or define special formats
etc. for observationdata if needed, even combining the big list with the
above mentioned rule-based approach:

odata = observationdata[{observation1,...},Name->"name of
dataset",StartDate->{2006,03,01},EndDate->{2006,04,01}]
Format[observationdata[data_,info___]]:=StringJoin["observationdata[<",Name/
{info},">]"]

from there you probably can see how to improve further, according to your
needs. Depending on the structure of your data it might also be interesting
to look into the documentation for SparesArray, too.
Prev by Date: Re: Calculating the null space of a matrix with univariate polynomial entries?
Next by Date: Re: .NET/Link and two-dimensional strings
Previous by thread: Re: structure array equivalent in Mathematica
Next by thread: Reordering limits for Plot3D