Re: Re: String patterns
- To: mathgroup@smc.vnet.net
- Subject: [mg10976] Re: [mg10943] Re: String patterns
- From: Mark Evans <evans.nospam@gte.net>
- Date: Sun, 15 Feb 1998 02:10:41 -0500
- Organization: None
- References: <6brf6t$e78@smc.vnet.net> <199802140553.AAA01174@smc.vnet.net.>
A handful of people, including Daniel Lichtblau, have been very helpful in answering my question. This input is appreciated. Strangely, no one has suggested a different line of thought than string functions. There is at least one programming language that is much better suited to string functions than Mathematica (Icon: http://www.cs.arizona.edu/icon/index.htm). Icon has some unique features (generators, success/failure propagation) that surpass even Mathematica for ease of expression and programming flexibility. I had a different notion for Mathematica that was not made explicit in my post. I was expecting someone to come up with the idea of turning the characters into numbers using their ASCII codes, or perhaps two-byte "wide character" codes. This conversion leaves you with a list of integer numbers. I would expect Mathematica to be much more competent with this data structure than with strings. Mark Daniel Lichtblau wrote: > > MJE wrote: > > > > Programming challenge: > > > > Is there an elegant means of doing cryptanalysis in Mathematica as > > opposed to any other language. I am mainly thinking of > > pattern-matching functions. In this case, the pattern would be > > dynamic, not predefined. I am not certain how to create and test > > patterns on the fly. > > > > The primary task is to count letter, digraph, trigraph, and higher-order > > frequencies. > > > > Output for the trigraph case might look like this: > > > > THE 0.01350000 > > AND 0.00709421 > > ION 0.00559429 > > ING 0.00510783 > > TIO 0.00466191 > > ENT 0.00458083 > > RES 0.00417545 > > <...etc....> > > BEP 0.00004054 > > > > The real number represents the fractional occurrence of the trigraph > > among all trigraphs in the sample. These were computed by a DOS > > utility on a particular sample text. The word "the" occurred 333 times > > out of 24668 total trigraph sequences, giving an estimated probability > > for this trigraph of 333/24668=0.01350000. > > > > Trigraphs overlap. If I parse the following phrase, > > > > "I love Mathematica" > > > > then the first trigraph is "I l" (spaces count), the second is " lo", > > and the third is "lov". > > > > One must define an "alphabet" with a sorting order. A good way to do > > this is with a string variable like this: > > > > "abcdefghijklmno..." > > > > How good is Mathematica at this kind of string manipultion and > > searching? > > > > Mark Evans > > evans@gte.net > > To find frequencies of a small set of given trigraphs you might use > StringPosition. > > In[23]:= str = "I love Mathematica because it has Mathieu functions, > matrix operations, and pattern matching."; > > In[24]:= strL = ToLowerCase[str]; > General::spell1: > Possible spelling error: new symbol name "strL" > is similar to existing symbol "str". > > In[26]:= Length[StringPosition[strL, "mat"]] Out[26]= 5 > > To check frequencies of all triads that occur in your string you first > might form the triads explicitly, as below. > > triads = Union[Table[StringTake[strL, {j,j+2}], {j,StringLength[strL]-2 > }]]; > > Then you could do > > In[51]:= Map[Length[StringPosition[strL,#]]&, triads] > Out[51]= {1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, > 1, > > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 5, 1, > > 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, > > 1, 1, 1, 1, 1, 1, 1} > > If you are working with large strings and many triads, a more efficient > method might be to initialize a set of function values, one entry per > triad, to zeroes. For example, > > In[54]:= Do[freq[triads[[j]]] = 0, {j,Length[triads]}] > > In[56]:= ?freq > Global`freq > freq[", a"] = 0 > freq["a b"] = 0 > ... > > Then iterate over the string, and for each triad you find increment the > appropriate function value. Takes a bit of coding (not too much) but > should be reasonably fast. > > Daniel Lichtblau > Wolfram Research
- References:
- Re: String patterns
- From: Daniel Lichtblau <danl@wolfram.com>
- Re: String patterns