MathGroup Archive 2008

[Date Index] [Thread Index] [Author Index]

Search the Archive

Re: Wavelet "filter"?

  • To: mathgroup at smc.vnet.net
  • Subject: [mg84994] Re: Wavelet "filter"?
  • From: henoliv <henoliv at yahoo.com>
  • Date: Tue, 22 Jan 2008 01:54:32 -0500 (EST)
  • References: <fmv1sh$314$1@smc.vnet.net> <fn1oa0$9v1$1@smc.vnet.net>

On Jan 21, 7:23 am, "Steve Luttrell"
<steve at _removemefirst_luttrell.org.uk> wrote:
> "Bill Rowe" <readnews... at sbcglobal.net> wrote in message
>
> news:fmv1sh$314$1 at smc.vnet.net...
>
>
>
> > On 1/19/08 at 6:06 AM, edsf... at uol.com.br (edsferr) wrote:
>
> >>Replying the questions:
>
> >>The 14 characters strings are all the possible strings with length
> >>14, i.e. 2^14=16384 strings.
>
> >>What I need is to select the ones which have higher probability to
> >>occur according to the wavelet power spectrum. Is it possible? I'ts
> >>like to see, if I'm not saying anything absurd, what are the most
> >>probable 14 length DNA sequence given a 4096 length sample. What
> >>happens in my case is that this sample is not randomically
> >>generated.
>
> > There really isn't much need to use wavelets to find the most
> > frequent 14 character long strings in your data set.
>
> > A simulated data set can be generated using:
>
> > data = RandomInteger[{0,1},4096];
>
> > All possible 14 character long strings in this longer set can be
> > computed using
>
> > Partition[data,14,1]
>
> > Since your data set is simply 1's and 0's, it is convenient to
> > encode them as a integer using FromDigits. So,
>
> > h = Split[Sort[FromDigits[#, 2] & /@ Partition[data, 14, 1]]];
>
> > will group all similar 14 character strings. And the most
> > frequent 5 will be
>
> > In[22]:= First /@ (h[[Ordering[h]]][[-5 ;;]])
>
> > Out[22]= {15466,16141,1859,6230,11307}
>
> > which can be converted back to 14 character strings if you like
> > using IntegerDigits[n,2,14] where n is the encoded string value.
> > --
> > To reply via email subtract one hundred and four
>
> And here is a way of doing that using Tally:
>
> data=RandomInteger[{0,1},4096];
> tally=Tally[Partition[data,14,1]];
> sortedtally=Sort[tally,#2[[2]]<=#1[[2]]&];
> Take[sortedtally,5]
>
> {{{0,0,1,1,0,1,0,1,1,0,1,0,1,0},4},{{0,1,1,0,0,1,0,1,0,1,1,0,0,0},4},{{1,1,0,0,1,0,1,0,1,1,0,0,0,1},4},{{0,0,0,1,1,0,1,0,1,1,0,1,0,1},3},{{1,0,0,0,1,0,1,1,1,1,0,1,0,0},3}}
>
> Stephen Luttrell
> West Malvern, UK

Athough it's a very clever way to analyse this problem, it seems that
my sample is not big enough to assume that we can rely on the results
we can get. That's where wavelet analysis would handle it better.

In the results I got, the lenght of possible outcomes was around 3000.
That's way less than 16384 which is the total possible strings. The
maximum repetiton per string was only 3...

I guess by using wavelet analysis we could analyse all possible
strings and discard those very unlikey to happen. For example, a
string containing less than 4 ones would be difficult to occur, since
approximately 50% of the results are ones.

Perhaps due to my sample's length I should analyse all possible 9
characters strings and "extend" the results I get to 14 length
strings.

Thank you very much for your attention!

Edson Ferreira


  • Prev by Date: Re: Re: Mathematica commenting of code using (* *) can't
  • Next by Date: Re: Two bugs in Mathematica 6.01 ".m" packaging.
  • Previous by thread: Re: Wavelet "filter"?
  • Next by thread: Re: Wavelet "filter"?