Re: Database memory usage
- To: mathgroup at smc.vnet.net
- Subject: [mg109446] Re: Database memory usage
- From: Rui <rui.rojo at gmail.com>
- Date: Tue, 27 Apr 2010 07:41:28 -0400 (EDT)
- References: <email@example.com> <firstname.lastname@example.org>
On Apr 27, 5:04 am, David Bailey <d... at removedbailey.co.uk> wrote: > Rui wrote: > > I'm new to databases. > > I want to put a big list in a database, to later work on it. > > It all worked fine, using the default hsqldb that comes in > > Mathematica, but I couldn't insert more than few thousand elements at > > a time, and then Link problems or java heap problems. > > Finally, I couldn't even open the connection. Java runs out of memory. > > Memory usage of javaw process is less than 300MB when it runs out. > > In any case, I thought that the databases were, or could be, stored in > > the disk, so they can really be big and be a real big advantage over > > the list. How can that be done? > > If the question is too newbie and you feel I need to read a tutorial > > on something, hehe, feel free to send me a link to a good one, hehe > > > Rui Rojo > > If you just want to write a lot of data away in chunks and re-read it > later, there is no need to use a database at all: > > str=OpenWrite["Myfile.dat"]; > > Write[str,list1] > .. > .. > .. > Write[str,listN] > Close[str] > > You could also write the data in binary form if you prefer. > > However, from what you say, I imagine there is some sort of memory leak > in the database handling process - you should report this as a bug to > Wolfram. In the meantime, if you really need to write a database, you > will need to use J/Link and possibly some actual Java code to do the job > correctly. > > David Baileyhttp://www.dbaileyconsultancy.co.uk My immediate need is to store a lot of data away and later be able to retreive only the parts I want without loading everything. I know that that can be done with streams, but I figure I would end up doing a mini database program myself with files and streams, hehe. My main objective, however, is to keep on learning, and streams is also something I've yet to play around with, so I'll probably give it a try. Right now I have lots of English literature that I've converted in txt, created tables and statistics on number of appearances of each word. More than 70 million words total, 5 million sentences that I wanna be able to fetch as example quotes of all words it contains. So, for example, I want to be able to query things like: * The 1000;;1040 most used words and the number of times they appeared * A random example of the list of examples of the word "rant" * Other stuff that I may add in the near future David Reiss! Very useful. I had seen a code around to increase the java heap size but it didn't work well. Yours, like a charm. That still left me with the second problem, that the db was stored in memory. Like Albert said, I found in the documenation of the hsqldb that if I created the table in raw sql with "CREATE CACHED TABLE....." then it stores it in disk. And it worked :D So, adding the java heap thing and the cached thing solved the issued. 2 other minor issues came up because of that: * it takes aaaages to connect to the database... It isn't using up more than 200MB, but it still takes minutes to connect * can barely do anything in batches, its very prone to errors even in small batches. Anyway, both of these I can live with. However, I downloaded MySQL to see if it works better. I know nooothing about settings, but I was able to do the minimum to have it apparently working It is taking loooooonger than hsqldb to insert the stuff in, but it's fast to connect, it's not using loots of memory (and I didn't have to tell it if its in memory or in disk), and the batch thing is working (for small batches). I'll be using both until my feelings chose between the two Thanks a lot.