Re: StringCases and Shortest
- To: mathgroup at smc.vnet.net
- Subject: [mg97361] Re: StringCases and Shortest
- From: "Sjoerd C. de Vries" <sjoerd.c.devries at gmail.com>
- Date: Thu, 12 Mar 2009 02:14:52 -0500 (EST)
- References: <gp8047$1k9$1@smc.vnet.net>
Hi grischika, As I understand it, Mathematica works itself throught the string from left to right. It starts at the first character and normally tries to find the largest substring that matches the string pattern. It the moves to the next position to try to find the next substring match. This movement is controlled by the Overlap option. If False (default), the next try will be the position following the last matched substring. If True, the next try will be the next character. If All, Mathematica will try to find smaller matching substring starting at the same character. If the pattern is embedded in a Shortest function, the match will be the shortest of a series of possible matches *starting from the INITIAL character*. So, with your example "(-(a)--(bb)--(c)-", starting at the first character, we have matches substrings "(-(a)-- (bb)--(c)", "(-(a)--(bb)", and "(-(a)". The shortest of these is "(- (a)". With the default Overlap->False, StringCases now moves on to "-- (bb)--(c)" to try to find further matches. Hence, the possible (a) match is now skipped over. If you set Overlap->All or True this match would also have been found. If you don't want these substrings to contain opening parentheses themselves you have to say so: StringCases["(-(a)--(bb)--(c)-", Shortest["(" ~~ x__ ~~ ")"] /; StringFreeQ[x, "("] ] If you want to remove the parenthesis around the matched substrings you can use: StringCases["(-(a)--(bb)--(c)-", Shortest["(" ~~ x__ ~~ ")"] /; StringFreeQ[x, "("] -> x] Cheers -- Sjoerd On Mar 11, 11:26 am, Grisch... at mail.ru wrote: > Hello! > I want to select shortest substring between brackets from the string. > For example: > > Func["f(a+b) some text (comments)" ] > > should give: > > {"a+b","comments"}, > > and > > Func["(f(a+b) some text (comments)" ] > > should give: > > {"(a+b)","(comments)"} too. > > In the help I found this line: > > in[]: StringCases["-(a)--(bb)--(c)-", Shortest["(" ~~ __ ~~ ")"]] > out: {"(a)","(bb)","(c)"} > > which, at first sight, works as I desire. > > But when I add bracket at start of line then answer is incorrect > in[]: StringCases["(-(a)--(bb)--(c)-", Shortest["(" ~~ __ ~~ ")"]] > out: {"(-(a)","(bb)","(c)"} > > What is wrong? And how to solve this problem?