MathGroup Archive 2010

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Database memory usage

  • To: mathgroup at smc.vnet.net
  • Subject: [mg109479] Re: Database memory usage
  • From: David Bailey <dave at removedbailey.co.uk>
  • Date: Thu, 29 Apr 2010 02:52:42 -0400 (EDT)
  • References: <hr3tj2$hdo$1@smc.vnet.net> <hr65pt$jnk$1@smc.vnet.net> <hr6ih3$2ef$1@smc.vnet.net>

Rui wrote:
> On Apr 27, 5:04 am, David Bailey <d... at removedbailey.co.uk> wrote:
>> Rui wrote:

> 
> My immediate need is to store a lot of data away and later be able to
> retreive only the parts I want without loading everything. I know that
> that can be done with streams, but I figure I would end up doing a
> mini database program myself with files and streams, hehe.
> My main objective, however, is to keep on learning, and streams is
> also something I've yet to play around with, so I'll probably give it
> a try.
> 
> Right now I have lots of English literature that I've converted in
> txt, created tables and statistics on number of appearances of each
> word. More than 70 million words total, 5 million sentences that I
> wanna be able to fetch as example quotes of all words it contains. So,
> for example, I want to be able to query things like:
> * The 1000;;1040 most used words and the number of times they appeared
> * A random example of the list of examples of the word "rant"
> * Other stuff that I may add in the near future

Have you established that you can perform the queries that you need 
using the database? I think I'd be inclined to take a small portion of 
your data, put it in the database and devise the queries that you would 
need.

My inclination would be to start by trying to do some of those things 
inside Mathematica. For example, you could have a list of 
{{"word1",usage},{"word2",usage},......}

Sorted on usage, that would answer your usage queries efficiently.

Mathematica is also rather good at doing things with very long strings:

In[1]:= sss = StringJoin @@ ConstantArray["x", 200000];

In[2]:= StringLength[sss]

Out[2]= 200000

In[3]:= sss1 = sss <> "fred";

In[5]:= StringPosition[sss1, "fred"] // Timing

Out[5]= {0.015, {{200001, 200004}}}

Thus an in-memory solution to your problem is probably feasible, 
particularly if you can run on 64-bit Mathematica.

David Bailey
http://www.dbaileyconsultancy.co.uk


  • Prev by Date: Re: InputField Rounding Problem
  • Next by Date: Re: Context Problem
  • Previous by thread: Re: Database memory usage
  • Next by thread: 3D visulaisation of 3D matrix for a 3D CA