MathGroup Archive: November 2013 [00066]

[Date Index] [Thread Index] [Author Index]

Re: defining averages over unknown PDF

To: mathgroup at smc.vnet.net
Subject: [mg131998] Re: defining averages over unknown PDF
From: Itai Seggev <itais at wolfram.com>
Date: Tue, 12 Nov 2013 23:37:49 -0500 (EST)
Delivered-to: l-mathgroup@mail-archive0.wolfram.com
Delivered-to: l-mathgroup@wolfram.com
Delivered-to: mathgroup-outx@smc.vnet.net
Delivered-to: mathgroup-newsendx@smc.vnet.net
References: <20131105041659.58DD96A05@smc.vnet.net>

I've done some digging into this.  The normal way you define custom
differentiation rules is by define UpValues for it and Derivative, like

f /: Derivative[n_][f_] := (-1)^n f  (*f is secrelt Exp[-x]*)

This way, all differentiation functions (of which there are D, Dt, Derivative,
and several more internal and user-visible) share equally in the rule.  When
you write your rule in terms of D and differentation av[...], D starts
evaluating, the rule fires, and everything is fine.  When av is inside another
function, the chain rule code fires and says that

Derivative[1][f[av]] := Derivative[1][f][av][#] * Derivative[1][av][#]&

Then this code looks up any rules for Derivative[1][av], finds none, and
procedes to use automatic differentiation, getting you back to your initial
state.

I'm afraid I don't have a really great solution for you.  D is really intended
to differentiate functions, but av is an operator in disguies which makes this
quite difficult to work with.  Getting a good operator calculus into the system
is a problem I and others have thought about, but it is a long-term project
which will not be solved in the short term. 

Sorry!

On Wed, Nov 06, 2013 at 12:33:43AM -0500, Sune Jespersen wrote:
> Hi Itai
> 	
> Thanks for your reply. Yes, you are of course right and I realized the same thing shortly after my post. In fact I implemented a solution quite similar to yours,
> In[1]:= av /: D[av[f___], x_] := av[D[f, x]]
> In[2]:= av[y_ + z_] := av[y] + av[z]
> In[3]:= av[c_ y_] := c av[y] /; FreeQ[c, x]
> In[4]:= av[c_] := c /; FreeQ[c, x]
> In[5]:= D[av[x y], x]
> Out[5]= y
> In[6]:= D[av[Exp[-x y]], x]
> Out[6]= -y av[E^(-x y)]
> But it seems that it still has the problem when it needs to apply the chain rule, i.e.
> In[9]:= D[Log[av[Exp[-b x]]], b]
> Out[9]= -((E^(-b x) x)/av[E^(-b x)])
> instead of
> -(av[(E^(-b x) x)]/av[E^(-b x)])
> This seems a bit strange to me, because somehow it must reach a point where it needs to evaluate a derivative, where my rule applies. Perhaps you can offer some insight on this?
> 
> On 5 Nov, 2013, at 18:19 , Itai Seggev <itais at wolfram.com> wrote:
> 
> > On Mon, Nov 04, 2013 at 11:16:59PM -0500, Sune wrote:
> >> Dear all.
> >>
> >> I want to do some symbolic manipulations of an expression involving averages over a stochastic variable with an unknown density. Therefore, I figured I could define a function av, which would correspond to the average over this unknown parameter density function.
> >> I did as follows:
> >> av[y_ + z_, x_] := av[y, x] + av[z, x]?
> >> av[c_ y_, x_] := c av[y, x] /; FreeQ[c, x]
> >> av[c_, x_] := c /; FreeQ[c, x]
> >>
> >> So these are basic properties of the average over the distribution of X. Some things work okay, for example
> >> In[52]:= av[Exp[-x y], x]?
> >> Out[52]= av[E^(-x y), x]
> >> and
> >> In[79]:= D[av[-x y, x], x]?
> >> Out[79]= -y
> >> and
> >> In[80]:= D[av[-x y, x], y]?
> >> Out[80]= -av[x, x].
> >>
> >> However, the most vital part for my calculations does not work:
> >> In[81]:= D[av[Exp[-x y], x], y]?
> >> Out[81]= -E^(-x y) x
> >>
> >> It should have been av[-Exp[-x y] x,x].
> >>
> >> Any clues to what I'm doing wrong? I'm thinking that I need to specify some rules for differentiation, but I don't know how. But then I'm wondering how come it got the other expressions for differentiation right.
> >
> > Ahh, the subtle treacheries of partial differentiation.  Note that by your
> > definition,
> >
> > In[71]:= av[Exp[-x y] + h, x] - av[Exp[-x y], x]
> >
> > Out[71]= h
> >
> > So that
> >
> > In[72]:= Limit[(av[Exp[-x y] + h, x] - av[Exp[-x y], x])/h, h -> 0]
> >
> > Out[72]= 1
> >
> > So both your "correct" and "incorrect" answers are consistent with the chain
> > rule and and the above computation of partial derivatives.  So why is D
> > computing the partial derivative in such a stupid way?  Well, it isn't, at
> > least not directly.  D correctly computes the partial derivative as
> >
> > f'[x] * Derivative[1, 0][av][f[x], x] + Derivative[0, 1][av][f[x],x]
> >
> > But now Derivative helpfully tries compute these partials using pure functions,
> > and then your definitions kick in, giving 1 and 0 for the partials.  In
> > particular, your third definitions means av[#1,#2]& === #1, and you're doomed.
> >
> > So you want to abort the automatic differentiation rules with your own custom
> > rule, which you can do with the following syntax:
> >
> > av /: D[av[f_, x_], y_] /; x =!= y := av[D[f, y], x]
> >
> > In[65]:= D[av[Exp[-x y], x], y]
> >
> > Out[65]= -av[E^(-x y) x, x]
> >
> > --
> > Itai Seggev
> > Mathematica Algorithms R&D
> > 217-398-0700
> 
> 
--
Itai Seggev
Mathematica Algorithms R&D
217-398-0700

References:
- defining averages over unknown PDF
  - From: Sune <sunenj@gmail.com>

Prev by Date: Re: unser-interface problems using Mathematica as a calculator

Next by Date: Re: defining averages over unknown PDF

Previous by thread: Re: defining averages over unknown PDF

Next by thread: Re: defining averages over unknown PDF