[Date Index]
[Thread Index]
[Author Index]
# Re: Re: String patterns
A handful of people, including Daniel Lichtblau, have been very helpful
in answering my question. This input is appreciated.
Strangely, no one has suggested a different line of thought than string
functions. There is at least one programming language that is much
better suited to string functions than Mathematica (Icon:
http://www.cs.arizona.edu/icon/index.htm). Icon has some unique
features (generators, success/failure propagation) that surpass even
Mathematica for ease of expression and programming flexibility. I had
a different notion for Mathematica that was not made explicit in my
post.
I was expecting someone to come up with the idea of turning the
characters into numbers using their ASCII codes, or perhaps two-byte
"wide character" codes. This conversion leaves you with a list of
integer numbers. I would expect Mathematica to be much more competent
with this data structure than with strings.
Mark
Daniel Lichtblau wrote:
>
> MJE wrote:
> >
> > Programming challenge:
> >
> > Is there an elegant means of doing cryptanalysis in Mathematica as
> > opposed to any other language. I am mainly thinking of
> > pattern-matching functions. In this case, the pattern would be
> > dynamic, not predefined. I am not certain how to create and test
> > patterns on the fly.
> >
> > The primary task is to count letter, digraph, trigraph, and higher-order
> > frequencies.
> >
> > Output for the trigraph case might look like this:
> >
> > THE 0.01350000
> > AND 0.00709421
> > ION 0.00559429
> > ING 0.00510783
> > TIO 0.00466191
> > ENT 0.00458083
> > RES 0.00417545
> > <...etc....>
> > BEP 0.00004054
> >
> > The real number represents the fractional occurrence of the trigraph
> > among all trigraphs in the sample. These were computed by a DOS
> > utility on a particular sample text. The word "the" occurred 333 times
> > out of 24668 total trigraph sequences, giving an estimated probability
> > for this trigraph of 333/24668=0.01350000.
> >
> > Trigraphs overlap. If I parse the following phrase,
> >
> > "I love Mathematica"
> >
> > then the first trigraph is "I l" (spaces count), the second is " lo",
> > and the third is "lov".
> >
> > One must define an "alphabet" with a sorting order. A good way to do
> > this is with a string variable like this:
> >
> > "abcdefghijklmno..."
> >
> > How good is Mathematica at this kind of string manipultion and
> > searching?
> >
> > Mark Evans
> > evans@gte.net
>
> To find frequencies of a small set of given trigraphs you might use
> StringPosition.
>
> In[23]:= str = "I love Mathematica because it has Mathieu functions,
> matrix operations, and pattern matching.";
>
> In[24]:= strL = ToLowerCase[str];
> General::spell1:
> Possible spelling error: new symbol name "strL"
> is similar to existing symbol "str".
>
> In[26]:= Length[StringPosition[strL, "mat"]] Out[26]= 5
>
> To check frequencies of all triads that occur in your string you first
> might form the triads explicitly, as below.
>
> triads = Union[Table[StringTake[strL, {j,j+2}], {j,StringLength[strL]-2
> }]];
>
> Then you could do
>
> In[51]:= Map[Length[StringPosition[strL,#]]&, triads]
> Out[51]= {1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1,
> > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 4, 5, 1,
> > 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2,
> > 1, 1, 1, 1, 1, 1, 1}
>
> If you are working with large strings and many triads, a more efficient
> method might be to initialize a set of function values, one entry per
> triad, to zeroes. For example,
>
> In[54]:= Do[freq[triads[[j]]] = 0, {j,Length[triads]}]
>
> In[56]:= ?freq
> Global`freq
> freq[", a"] = 0
> freq["a b"] = 0
> ...
>
> Then iterate over the string, and for each triad you find increment the
> appropriate function value. Takes a bit of coding (not too much) but
> should be reasonably fast.
>
> Daniel Lichtblau
> Wolfram Research
Prev by Date:
**Re: How do you draw Triangles and other Geometric figures in Mathematica?**
Next by Date:
**No Subject**
Prev by thread:
**Re: String patterns**
Next by thread:
**Fonts in graphics**
| |