Re: Wavelet "filter"?
- To: mathgroup at smc.vnet.net
- Subject: [mg85041] Re: Wavelet "filter"?
- From: edsferr <edsferr at uol.com.br>
- Date: Fri, 25 Jan 2008 05:02:03 -0500 (EST)
- References: <fmv1sh$314$1@smc.vnet.net> <fn1oa0$9v1$1@smc.vnet.net>
On Jan 21, 7:23 am, "Steve Luttrell" <steve at _removemefirst_luttrell.org.uk> wrote: > "Bill Rowe" <readnews... at sbcglobal.net> wrote in message > > news:fmv1sh$314$1 at smc.vnet.net... > > > > > > > > > On 1/19/08 at 6:06 AM, edsf... at uol.com.br (edsferr) wrote: > > >>Replying the questions: > > >>The 14 characters strings are all the possible strings with length > >>14, i.e. 2^14=16384 strings. > > >>What I need is to select the ones which have higher probability to > >>occur according to the wavelet power spectrum. Is it possible? I'ts > >>like to see, if I'm not saying anything absurd, what are the most > >>probable 14 length DNA sequence given a 4096 length sample. What > >>happens in my case is that this sample is not randomically > >>generated. > > > There really isn't much need to use wavelets to find the most > > frequent 14 character long strings in your data set. > > > A simulated data set can be generated using: > > > data = RandomInteger[{0,1},4096]; > > > All possible 14 character long strings in this longer set can be > > computed using > > > Partition[data,14,1] > > > Since your data set is simply 1's and 0's, it is convenient to > > encode them as a integer using FromDigits. So, > > > h = Split[Sort[FromDigits[#, 2] & /@ Partition[data, 14, 1]]]; > > > will group all similar 14 character strings. And the most > > frequent 5 will be > > > In[22]:= First /@ (h[[Ordering[h]]][[-5 ;;]]) > > > Out[22]= {15466,16141,1859,6230,11307} > > > which can be converted back to 14 character strings if you like > > using IntegerDigits[n,2,14] where n is the encoded string value. > > -- > > To reply via email subtract one hundred and four > > And here is a way of doing that using Tally: > > data=RandomInteger[{0,1},4096]; > tally=Tally[Partition[data,14,1]]; > sortedtally=Sort[tally,#2[[2]]<=#1[[2]]&]; > Take[sortedtally,5] > > {{{0,0,1,1,0,1,0,1,1,0,1,0,1,0},4},{{0,1,1,0,0,1,0,1,0,1,1,0,0,0},4},{{1,1,0,0,1,0,1,0,1,1,0,0,0,1},4},{{0,0,0,1,1,0,1,0,1,1,0,1,0,1},3},{{1,0,0,0,1,0,1,1,1,1,0,1,0,0},3}} > > Stephen Luttrell > West Malvern, UK That's a very clever way to do this task, although this is not what I wanted. I don't want to search which are the most commom strings IN my data. Based on wavelet analysis I think , if I'm not wrong, one is able to select the strings that are the most likely to occur among all the possible strings. I run the code above and the most common string occured 3 times only due to my small data length. This is where wavelet analysis could help me. I have 16384 possible strings with 14 characters and my data is not able to give the neccessary samples to deduce something. I could do the proposed analysis to a string with 7 characters I guess... Anyway, I would like to thank you all for the attention!