[Date Index]
[Thread Index]
[Author Index]
Re: Obtaining Random LIne from A file
*To*: mathgroup at smc.vnet.net
*Subject*: [mg129860] Re: Obtaining Random LIne from A file
*From*: David Bailey <dave at removedbailey.co.uk>
*Date*: Tue, 19 Feb 2013 00:59:46 -0500 (EST)
*Delivered-to*: l-mathgroup@mail-archive0.wolfram.com
*Delivered-to*: l-mathgroup@wolfram.com
*Delivered-to*: mathgroup-newout@smc.vnet.net
*Delivered-to*: mathgroup-newsend@smc.vnet.net
*References*: <kfn7nt$qaj$1@smc.vnet.net> <kfq6mb$4us$1@smc.vnet.net> <kft1ij$cov$1@smc.vnet.net>
On 18/02/2013 10:59, awnl wrote:
> Am 17.02.2013 10:08, schrieb David Bailey:
>> On 16/02/2013 06:07, Ramiro Barrantes wrote:
>>> Hello,
>>>
>>> I would like to get a random line from a file, I know this can be done
>>> with Mathematica but I am playing with using sed to see if it goes
>>> faster, say I want to get line 1000
>>>
>>> In mathematica it would be:
>>>
>>> <<"! sed -n p1000 filename.txt"
>>>
>>> However, I am trying to put the filename as a variable, say
>>>
>>> filename="hugefile.txt"
>>>
>>> cmd="! sed -n p1000 "<>filename
>>> <<cmd
>>>
>>> does not work.
>>>
>>> How can I do this?
>>>
>>> Lastly, I am getting a randomline using mathematica doing:
>>>
>>> getRandomLine[file_, n_] :=
>>> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res},
>>> Skip[str, "String", i];
>>> res = Read[str, Expression];
>>> Close[str];
>>> res[[2]]
>>> ]
>>>
>>> However, it is very slow so I was going to try with sed.Any suggestions?
>>>
>>> Thanks in advance,
>>> Ramiro
>>>
>>>
>> I would stick with Mathematica to do this job! How big is the file
>> (number of lines and number of bytes)? If it will fit inside Mathematica
>> comfortable, I'd see how it works to read it all in as a list of strings
>> and pick the one you want:
>>
>> xx=ReadList["C:\\some file",String];//Timing
>>
>> Then you have an array of strings, and you can select what you want
>> directly.
>>
>> Remember, the basic problem with reading at an arbitrary position in a
>> text file, is that if the line lengths are not the same, any algorithm
>> has to read every line before the one you want!
>
> if he just wants to get an arbitrary line that's not true: just choosing
> a position in the file at random and searching e.g. the previous and
> next linebreak would also result in picking a random line. Of course the
> probability of choosing longer lines would be larger than that for
> shorter lines, but it isn't clear from the question whether that would
> be a problem for what the OP tries to do...
>
>> If you create this file,
>> you should consider packing the lines to make them all the same length -
>> then you could access what you want very efficiently (but with a little
>> more coding!)
>
> ... and slightly (?) higher memory requirements...
>
> hth,
>
> albert
>
Well yes - but I assume that by 'arbitrary' he means a specific record
somewhere in the file!
David Bailey
http://www.dbaileyconsultancy.co.uk
Prev by Date:
**Ingolf Dahl's "SetFaceAndFont" palette broken in Mathematica 9**
Next by Date:
**Re: Stephen Wolfram's recent blog**
Previous by thread:
**Re: Obtaining Random LIne from A file**
Next by thread:
**Re: Obtaining Random LIne from A file**
| |