MathGroup Archive: January 2008 [00531]

[Date Index] [Thread Index] [Author Index]

Re: Wavelet "filter"?

To: mathgroup at smc.vnet.net
Subject: [mg85041] Re: Wavelet "filter"?
From: edsferr <edsferr at uol.com.br>
Date: Fri, 25 Jan 2008 05:02:03 -0500 (EST)
References: <fmv1sh$314$1@smc.vnet.net> <fn1oa0$9v1$1@smc.vnet.net>

On Jan 21, 7:23 am, "Steve Luttrell"
<steve at _removemefirst_luttrell.org.uk> wrote:
> "Bill Rowe" <readnews... at sbcglobal.net> wrote in message
>
> news:fmv1sh$314$1 at smc.vnet.net...
>
>
>
>
>
>
>
> > On 1/19/08 at 6:06 AM, edsf... at uol.com.br (edsferr) wrote:
>
> >>Replying the questions:
>
> >>The 14 characters strings are all the possible strings with length
> >>14, i.e. 2^14=16384 strings.
>
> >>What I need is to select the ones which have higher probability to
> >>occur according to the wavelet power spectrum. Is it possible? I'ts
> >>like to see, if I'm not saying anything absurd, what are the most
> >>probable 14 length DNA sequence given a 4096 length sample. What
> >>happens in my case is that this sample is not randomically
> >>generated.
>
> > There really isn't much need to use wavelets to find the most
> > frequent 14 character long strings in your data set.
>
> > A simulated data set can be generated using:
>
> > data = RandomInteger[{0,1},4096];
>
> > All possible 14 character long strings in this longer set can be
> > computed using
>
> > Partition[data,14,1]
>
> > Since your data set is simply 1's and 0's, it is convenient to
> > encode them as a integer using FromDigits. So,
>
> > h = Split[Sort[FromDigits[#, 2] & /@ Partition[data, 14, 1]]];
>
> > will group all similar 14 character strings. And the most
> > frequent 5 will be
>
> > In[22]:= First /@ (h[[Ordering[h]]][[-5 ;;]])
>
> > Out[22]= {15466,16141,1859,6230,11307}
>
> > which can be converted back to 14 character strings if you like
> > using IntegerDigits[n,2,14] where n is the encoded string value.
> > --
> > To reply via email subtract one hundred and four
>
> And here is a way of doing that using Tally:
>
> data=RandomInteger[{0,1},4096];
> tally=Tally[Partition[data,14,1]];
> sortedtally=Sort[tally,#2[[2]]<=#1[[2]]&];
> Take[sortedtally,5]
>
> {{{0,0,1,1,0,1,0,1,1,0,1,0,1,0},4},{{0,1,1,0,0,1,0,1,0,1,1,0,0,0},4},{{1,1,0,0,1,0,1,0,1,1,0,0,0,1},4},{{0,0,0,1,1,0,1,0,1,1,0,1,0,1},3},{{1,0,0,0,1,0,1,1,1,1,0,1,0,0},3}}
>
> Stephen Luttrell
> West Malvern, UK

That's a very clever way to do this task, although this is not what I
wanted. I don't want to search which are the most commom strings IN my
data. Based on wavelet analysis I think , if I'm not wrong, one is
able to select the strings that are the most likely to occur among all
the possible strings.

I run the code above and the most common string occured 3 times only
due to my small data length. This is where wavelet analysis could help
me. I have 16384 possible strings with 14 characters and my data is
not able to give the neccessary samples to deduce something.

I could do the proposed analysis to a string with 7 characters I
guess...

Anyway, I would like to thank you all for the attention!

Prev by Date: Re: Re: How remove CellChangeTimes values?

Next by Date: Re: Using 'IF' function on 'Lists' in Mathematica 6.01

Previous by thread: Re: Wavelet "filter"?

Next by thread: Re: Wavelet "filter"?