MathGroup Archive: August 2009 [00237]

[Date Index] [Thread Index] [Author Index]
Re: Re: Finding the Position of Elements in a
To: mathgroup at smc.vnet.net
Subject: [mg102406] Re: [mg102195] Re: [mg102157] Finding the Position of Elements in a
From: Gregory Lypny <gregory.lypny at videotron.ca>
Date: Sat, 8 Aug 2009 04:38:20 -0400 (EDT)
References: <200907310953.FAA19267@smc.vnet.net>
	Hi Leonid,

	I tried your function and it works well.  Thanks.  I tried it on the  
following list of 12 strings where I searched for the word "can".   If  
I change your LetterCharacter condition to WordCharacter, the function  
will skip over instances of the search string that is attached to  
numbers, as in strings 8 and 9, which is what I'd want most of the time.

TableForm[
  listOfStrings =
{"can is the first word in this sentence, and it is not capitalized on  
purpose.",
"But it is not in this one.",
"Here, the word can is in the middle.",
"Don't want cannot because it is a substring of another word (its  
opposite).",
"Attached to junk as in, She can+#@ he won't.",
"Concatonated as in, can't.",
"Preceded by punction without a space: pot;bottle;can;jar.",
"Preceded by a digit: 1035can.",
"Followed by a digit: can98.6.",
"At the end but without a stop as in, It's in the can",
"At the end with a stop as in, The money is in the can.",
"And finally, CAN in caps."},
TableHeadings -> Automatic]

	I also took a shot at putting together the parts of my own function  
and I came up with this to identify the cases.  It gives the same  
result as yours (with WordCharacter).

	StringCases[#, ___ ~~ WordBoundary ~~ "can" ~~ WordBoundary ~~ ___,  
IgnoreCase -> False] & /@ listOfStrings

And wrapping Position[] around it correctly identifies the lists with  
the hits.

	Position[StringCases[#, ___ ~~ WordBoundary ~~ "can" ~~ WordBoundary  
~~ ___,  IgnoreCase -> False] & /@ listOfStrings, {_}]

	I'm not really sure I understand WordBoundary, but it seems to allow  
search strings at the beginning or end of the target string to be  
caught as well.  There are only two cases that I'm catching that I'd  
like to exclude.  The first is junk attached to the word, as in list 5  
(can+#@), which is caught by the function.  The other is  
concatenations like in list 6 (can't), which would have to be  
distinguished from possessives (can's or cans').  But for those, maybe  
it would be easiest to run a check after the fact.

Regards,

	Gregory







On Tue, Aug 4, 2009, at 5:42 AM, Leonid Shifrin wrote:

> Hi Gregory,
>
> taking the (perhaps overly conservative) definition of a word as a  
> string matching the
> pattern (LetterCharacter ..),  the function below will hopefully do  
> what you need
>
> wordMatchPositions[listOfStrings : {__String}, searchWord_String] :=
>   Position[
>    StringCases[listOfStrings, LetterCharacter ..],
>    _?(MemberQ[#, searchWord] &)];
>
>
> Regards,
> Leonid
>
> On Sat, Aug 1, 2009 at 11:58 AM, Gregory Lypny <gregory.lypny at videotron.ca 
> > wrote:
>
> Thanks again Leonid,
>
> I'll give this one a whirl.
>
>        Gregory
>
>
> On 31-Jul-09, at 10:16 AM, Leonid Shifrin wrote:
>
> > Hi Gregory,
> >
> > just an amendment to my previous post:
> >
> > memberPositions[listOfStrings,
> > Flatten@StringCases[listOfStrings, __ ~~ searchString ~~ ___]]
> >
> > (I was missing Flatten).
> >
> > Also, I have realized that there are easier ways, like this, for
> > example:
> >
> > Select[Transpose[{Range[Length[listOfStrings]],
> >     StringCases[
> >      listOfStrings, __ ~~
> >       searchString ~~ ___]}], #[[2]] =!= {} &][[All, 1]]
> >
> >
> > Regarding your request to test for individual words - quite doable,
> > I just happen
> > to have zero time at the moment, to do it carefully. In case if  no
> > other solution is suggested,I will post one on Monday.
> >
> > Regards,
> > Leonid
> >
> >
> >
> > On Fri, Jul 31, 2009 at 4:13 PM, Gregory Lypny <gregory.lypny at videotron.ca
> > > wrote:
> >       Just what I needed.  Thank you, Leonid, and I will thumb  
> through
> > your book.  One of the things I want to work on is a function that
> > finds a word (string that is space delimited on both sides or one
> > side if occurring at the beginning or end of a string).  It would
> > work something like
> >
> >       If theSearchString is among the words of theTargetString then
> > return True
> >
> > This is a function that Runtime Revolution (a.k.a MetaCard and
> > originally HyperCard) has, and it is indispensible.
> >
> >       Regards,
> >
> >               Gregory
> >
> >
> >
> > On Fri, Jul 31, 2009, at 7:12 AM, Leonid Shifrin wrote:
> >
> >> Hi Gregory.
> >>
> >> Position[listOfStrings,x_/;StringMatchQ[x, __ ~~ searchString ~~
> >> ___]],
> >>
> >> if your list is not too large. If it is, and you want to speed it
> >> up, one quick
> >> solution is to use StringCases (as you did) to produce the list of
> >> matches,
> >> and then the <memberPositions> function that I developed in my  
> book:
> >>
> >> http://www.mathprogramming-intro.org/book/node596.html
> >>
> >> ,see at the bottom of the page - you feed both the original list
> >> and the list of results to it, like this:
> >>
> >> memberPositions[listOfStrings,
> >> StringCases[listOfStrings, __ ~~ searchString ~~ ___]]
> >>
> >> I expect this to be fast even for large lists of strings:
> >> StringCases is much faster when used on a whole list of strings
> >> rather than used separately on each string, and my function is also
> >> rather fast. I did not benchmark you problem though, so it is just
> >> my guess that this way will be faster than the first one above.
> >>
> >>
> >> Hope this helps.
> >>
> >> Regards,
> >> Leonid
> >>
> >>
> >>
> >>
> >> On Fri, Jul 31, 2009 at 1:53 PM, Gregory Lypny <gregory.lypny at videotron.ca
> >> > wrote:
> >> Hello everyone,
> >>
> >> Suppose I have a list of strings, say, sentences such as
> >>
> >>        listOfStrings = {"The cat is here.", "It's not here.",  "Not
> >> in the
> >> catalogue,", "Where is the cat?"}
> >>
> >> and a string I want to search for
> >>
> >>        searchString = "cat"
> >>
> >> I can use StringCases to pick off the elements that contain the
> >> search
> >> string
> >>
> >>        StringCases[listOfStrings, __ ~~ searchString ~~ ___]
> >>
> >>        {{"The cat is here."}, {}, {"Not in the catalogue,"},
> >> {"Where is the
> >> cat?"}}
> >>
> >> But what if I just want to know the positions of the elements that
> >> are
> >> hits?  In this case, it's
> >>
> >>        {1, 3, 4}
> >>
> >> If I use
> >>
> >>        Position[listOfStrings, ___ ~~ theString ~~ ___]
> >>
> >> I get
> >>
> >>        {}
> >>
> >> which is not what I expect.  Also how can I have my searchString
> >> treated like it is a word so that 3 is not one of the hits?
> >>
> >> Any hints would be much appreciated.
> >>
> >>        Gregory
> >>
> >>
> >
> >
>
Prev by Date: Re: A Sum-like notation for iteration
Next by Date: help import latex file with mathematica 7
Previous by thread: Re: Re: Finding the Position of Elements in a
Next by thread: solving for a second function