Re:Re: metacharacters as record separators
- To: mathgroup at smc.vnet.net
- Subject: [mg26968] Re:[mg26935]Re: [mg26881] metacharacters as record separators
- From: Tomas Garza <tgarza01 at prodigy.net.mx>
- Date: Tue, 30 Jan 2001 03:38:19 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
Another approach to your problem, with a more Mathematica-like flavor:
In[1]:=
ReadList["testwords.txt", Word]
Out[1]=
{"Those", "friends", "thou", "hast", "\",\"", "and", "their",
"adoption", \
"tried", "\",\"", "grapple", "them", "unto", "thy", "soul", "with",
"hoops", \
"of", "steel"}
In[2]:=
a1 = Characters[StringJoin @@ %]
Out[2]=
{"T", "h", "o", "s", "e", "f", "r", "i", "e", "n", "d", "s", "t", "h",
"o", \
"u", "h", "a", "s", "t", "\"", ",", "\"", "a", "n", "d", "t", "h", "e",
"i", \
"r", "a", "d", "o", "p", "t", "i", "o", "n", "t", "r", "i", "e", "d",
"\"", \
",", "\"", "g", "r", "a", "p", "p", "l", "e", "t", "h", "e", "m", "u",
"n", \
"t", "o", "t", "h", "y", "s", "o", "u", "l", "w", "i", "t", "h", "h",
"o", \
"o", "p", "s", "o", "f", "s", "t", "e", "e", "l"}
Now locate the positions of the character you have chosen as separator
(in this case, "h"):
In[3]:=
a2 = Position[a1, "h"] // Flatten
Out[3]=
{2, 14, 17, 28, 56, 64, 73, 74}
Then find the starting and ending positions for each of the records
separated by "h"s:
In[4]:=
begsAndEnds = {Prepend[# + 1 & /@ a2, 1], Append[# - 1 & /@ a2,
Length[a1]]} //
Transpose
Out[4]=
{{1, 1}, {3, 13}, {15, 16}, {18, 27}, {29, 55}, {57, 63}, {65, 72}, {74,
73}, {75, 85}}
Now, select the characters which make up the separated records:
In[5]:=
Take[a1, #] & /@ begsAndEnds
Out[5]=
{{"T"}, {"o", "s", "e", "f", "r", "i", "e", "n", "d", "s", "t"}, {"o",
"u"}, {"a", "s", "t", "\"", ",", "\"", "a", "n", "d", "t"}, {"e",
"i",
"r", "a", "d", "o", "p", "t", "i", "o", "n", "t", "r", "i", "e",
"d",
"\"", ",", "\"", "g", "r", "a", "p", "p", "l", "e", "t"}, {"e", "m",
"u",
"n", "t", "o", "t"}, {"y", "s", "o", "u", "l", "w", "i", "t"}, {},
{"o",
"o", "p", "s", "o", "f", "s", "t", "e", "e", "l"}}
and, finally, reconstruct the records
In[6]:=
StringJoin /@ %
Out[6]=
{"T", "osefriendst", "ou", "ast\",\"andt",
"eiradoptiontried\",\"grapplet", \
"emuntot", "ysoulwit", "", "oopsofsteel"}
Tomas Garza
Mexico City
Aaron Hirsh wrote:
> Is it possible to use metacharacters as part of record separators in
> ReadList? For example, can one separate data at each occurrence of a
> number, or at each occurrence of a character other than an uppercase
> letter?
>