Re:Re: metacharacters as record separators
- To: mathgroup at smc.vnet.net
- Subject: [mg26968] Re:[mg26935]Re: [mg26881] metacharacters as record separators
- From: Tomas Garza <tgarza01 at prodigy.net.mx>
- Date: Tue, 30 Jan 2001 03:38:19 -0500 (EST)
- Sender: owner-wri-mathgroup at wolfram.com
Another approach to your problem, with a more Mathematica-like flavor: In[1]:= ReadList["testwords.txt", Word] Out[1]= {"Those", "friends", "thou", "hast", "\",\"", "and", "their", "adoption", \ "tried", "\",\"", "grapple", "them", "unto", "thy", "soul", "with", "hoops", \ "of", "steel"} In[2]:= a1 = Characters[StringJoin @@ %] Out[2]= {"T", "h", "o", "s", "e", "f", "r", "i", "e", "n", "d", "s", "t", "h", "o", \ "u", "h", "a", "s", "t", "\"", ",", "\"", "a", "n", "d", "t", "h", "e", "i", \ "r", "a", "d", "o", "p", "t", "i", "o", "n", "t", "r", "i", "e", "d", "\"", \ ",", "\"", "g", "r", "a", "p", "p", "l", "e", "t", "h", "e", "m", "u", "n", \ "t", "o", "t", "h", "y", "s", "o", "u", "l", "w", "i", "t", "h", "h", "o", \ "o", "p", "s", "o", "f", "s", "t", "e", "e", "l"} Now locate the positions of the character you have chosen as separator (in this case, "h"): In[3]:= a2 = Position[a1, "h"] // Flatten Out[3]= {2, 14, 17, 28, 56, 64, 73, 74} Then find the starting and ending positions for each of the records separated by "h"s: In[4]:= begsAndEnds = {Prepend[# + 1 & /@ a2, 1], Append[# - 1 & /@ a2, Length[a1]]} // Transpose Out[4]= {{1, 1}, {3, 13}, {15, 16}, {18, 27}, {29, 55}, {57, 63}, {65, 72}, {74, 73}, {75, 85}} Now, select the characters which make up the separated records: In[5]:= Take[a1, #] & /@ begsAndEnds Out[5]= {{"T"}, {"o", "s", "e", "f", "r", "i", "e", "n", "d", "s", "t"}, {"o", "u"}, {"a", "s", "t", "\"", ",", "\"", "a", "n", "d", "t"}, {"e", "i", "r", "a", "d", "o", "p", "t", "i", "o", "n", "t", "r", "i", "e", "d", "\"", ",", "\"", "g", "r", "a", "p", "p", "l", "e", "t"}, {"e", "m", "u", "n", "t", "o", "t"}, {"y", "s", "o", "u", "l", "w", "i", "t"}, {}, {"o", "o", "p", "s", "o", "f", "s", "t", "e", "e", "l"}} and, finally, reconstruct the records In[6]:= StringJoin /@ % Out[6]= {"T", "osefriendst", "ou", "ast\",\"andt", "eiradoptiontried\",\"grapplet", \ "emuntot", "ysoulwit", "", "oopsofsteel"} Tomas Garza Mexico City Aaron Hirsh wrote: > Is it possible to use metacharacters as part of record separators in > ReadList? For example, can one separate data at each occurrence of a > number, or at each occurrence of a character other than an uppercase > letter? >