Author: luc
Date: Tue May 6 14:15:38 2008
New Revision: 653926
URL: http://svn.apache.org/viewvc?rev=653926&view=rev
Log:
tried to improve documentation,
added various explanations
slightly reorganized the sections
(this documentation is still work in progress)
Modified:
commons/sandbox/nabla/trunk/src/site/xdoc/index.xml
commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml
commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml
Modified: commons/sandbox/nabla/trunk/src/site/xdoc/index.xml
URL: http://svn.apache.org/viewvc/commons/sandbox/nabla/trunk/src/site/xdoc/index.xml?rev=653926&r1=653925&r2=653926&view=diff
==============================================================================
 commons/sandbox/nabla/trunk/src/site/xdoc/index.xml (original)
+++ commons/sandbox/nabla/trunk/src/site/xdoc/index.xml Tue May 6 14:15:38 2008
@@ 26,48 +26,70 @@
<section name="Introduction">
<p>
 Nabla is an automatic differentiator for mathematical functions.
 Just like the mathematical Nabla operator transforms a function
 into its differential, the Nabla library transforms an existing
 java object implementing a function
 <code>double f(double) { ... }</code>
 into another java object that in addition to computing the value
 of
 <code>f</code>
 like the original one also computes its differential. The
 created object implements differentiation using the classical
 exact rules. There are no approximations and no step sizes.
+ Nabla is an automatic differentiator for mathematical
+ functions.
+ </p>
+ <p>
+ Just like the mathematical Nabla operator transforms a
+ function into its differential, the Nabla library transforms
+ an existing java object implementing a function
+ <pre><code>double f(double) { ... }</code></pre>
+ into another java object that in addition to computing the
+ value of <code>f</code> like the original one also computes
+ its derivative. The created object is built by applying the
+ classical exact differentiation rules to the function
+ underlying expressions. There are <em>no</em> approximations
+ and <em>no</em> step sizes.
</p>
 <p>This approach has the following benefits:</p>
+ <p>
+ This approach has the following benefits:
<ul>
<li>differentiation is exact</li>
 <li>there are no problemdependent step size to handle</li>
+ <li>
+ there are no problemdependent step size to handle:
+ <ul>
+ <li>no problems of physical units</li>
+ <li>no problems of order of magnitude</li>
+ <li>no a priori knowledge of the behavior required</li>
+ </ul>
+ </li>
+ <li>there are no configuration parameters</li>
<li>
differentiation can be computed even at domains boundaries
</li>
<li>
 there is no special handling of source (no symbolic package
 with its own language, no source code generation, no
 integration with the rest of application)
+ there is no special handling of source:
+ <ul>
+ <li>no symbolic package with its own language</li>
+ <li>no limitation to single selfcontained expressions</li>
+ <li>no source code generation</li>
+ <li>no manual integration with the rest of application</li>
+ </ul>
</li>
<li>
one writes and maintains only the basic equation and get the
differential for free
</li>
<li>
+ it can be applied to already existing code without refactoring
+ </li>
+ <li>
it is effective even when source code is not available (but
there may be licensing issues in this case since what Nabla
does automatically is really ... derived work)
</li>
</ul>
+ </p>
<p>
 The derivative instance is tightly bound to the base instance.
 So if after Nabla has performed the transformation the base
 instance is mutated outside of the scope of Nabla, the already
 produced derivative instance will used this mutated state for
 its later computation automatically.
+ The derivative instance remains tightly bound to the base
+ instance (which is refered to as its primitive) throughout its
+ lifetime. If the internal state of the primitive instance is
+ mutated after Nabla has performed the transformation, the
+ already created derivative instance which is bound to its
+ primitive will use this mutated state for its computation
+ automatically.
</p>
</section>
@@ 75,111 +97,145 @@
<section name="Example">
<p>
 The following example should explain better what Nabla can do
 for you.
+ The following example should explain better what Nabla can do.
</p>
 <p>
 Let's consider a simple problem: we want to find the maximal
 value of a function:
 <pre><code>f(t) = (6t)/3 + cos(4t4) e<sup>0.9t</sup></code></pre>
 </p>
 <img src="images/maximum.png"
 alt="a curve with maximal value 2.109747 at t = 0.898775" />

 <p>
 The maximal value is reached when the first derivative of the
 function is equal to zero. So we need to compute the first
 derivative <code>f'(t)</code> and find its roots. We start by
 implementing the function <code>f(t)</code>:
 </p>

 <source>
 UnivariateDifferentiable function = new UnivariateDifferentiable() {
 public double f(double t) {
 return (6  t) / 3 + Math.cos(4 * t  4) * Math.exp(0.9 * t);
 }
 };
 </source>

 <p>
 We use the Nabla automatic differentiator to differentiate our
 function and returning an object implementing the derivative:
 </p>

 <source>
 UnivariateDifferentiator differentiator = new AutomaticDifferentiator();
 final UnivariateDerivative derivative = differentiator.differentiate(function);
 </source>

 <p>
 We get the maximal value by calling a solver on the derivative.
 In this example, we will use the
 <a href="http://commons.apache.org/math/apidocs/org/apache/commons/math/analysis/BrentSolver.html">
 Brent solver</a> from the <a href="http://commons.apache.org/math/">commons
 math</a> library. In order to do this, we wrap the derivative into an object
 implementing the <a
 href="http://commons.apache.org/math/apidocs/org/apache/commons/math/analysis/UnivariateRealFunction.html">
 UnivariateRealFunction</a> interface required by all commons math solvers:
 </p>

 <source>
 UnivariateRealFunction wrappedDerivative = new UnivariateRealFunction() {
 public double value(double x) {
 return derivative.f(DifferentialPair.newVariable(x)).getFirstDerivative();
 }
 };
 UnivariateRealSolver solver = new BrentSolver(wrappedDerivative);
 double tMax = solver.solve(0.5, 1.5);
 double yMax = derivative.f(DifferentialPair.newVariable(tMax)).getValue();
 System.out.println("max value = " + yMax + ", at t = " + tMax + " " + solver.getIterationCount());
 </source>

 <p>
 We get the following result:
 </p>

 <source>
 max value = 2.1097470218140537, at t = 0.8987751653383649 (7 iterations)
 </source>

 <p>
 We could also use the derivative object to compute both the
 value and the first derivative of our function at any point:
 </p>

 <source>
 for (double t = 0.0; t < 1.0; t += 0.01) {
 DifferentialPair y = derivative.f(DifferentialPair.newVariable(t));
 System.out.println(t + " " + y.getValue() + " " + y.getFirstDerivative());
 }
 </source>
+ <subsection name="Problem statement">
+ <p>
+ Let's consider a simple problem: we want to find the maximal
+ value of a function:
+ <pre><code>f(t) = (6t)/3 + cos(4t4) e<sup>0.9t</sup></code></pre>
+ </p>
+ <img src="images/maximum.png"
+ alt="a curve with maximal value 2.109747 at t = 0.898775" />
+
+ <p>
+ The maximal value is reached when the first derivative of
+ the function is equal to zero. So we need to compute the
+ first derivative <code>f'(t)</code> and find its
+ roots. Nabla will help in the first part:
+ computing <code>f'(t)</code>.
+ </p>
+ </subsection>
+
+ <subsection name="Solution implementation">
+
+ <p>
+ The maximal value is reached when the first derivative of
+ the function is equal to zero. So we need to compute the
+ first derivative <code>f'(t)</code> and find its
+ roots. Nabla will help in the first part:
+ computing <code>f'(t)</code>. We start by implementing the
+ function <code>f(t)</code>:
+ </p>
+ <source>
+ UnivariateDifferentiable function = new UnivariateDifferentiable() {
+ public double f(double t) {
+ return (6  t) / 3 + Math.cos(4 * t  4) * Math.exp(0.9 * t);
+ }
+ };
+ </source>
+
+ <p>
+ We use the Nabla automatic differentiator to differentiate our
+ function and obtain an object implementing the derivative:
+ </p>
+
+ <source>
+ UnivariateDifferentiator differentiator = new AutomaticDifferentiator();
+ final UnivariateDerivative derivative = differentiator.differentiate(function);
+ </source>
+
+ <p>
+ We get the maximal value by calling a solver on the
+ derivative. In this example, we will use
+ the <a href="http://commons.apache.org/math/apidocs/org/apache/commons/math/analysis/BrentSolver.html">
+ Brent solver</a> from
+ the <a href="http://commons.apache.org/math/">commonsmath</a>
+ library. Functions passed to any commonsmath solver must
+ implement a specific
+ interface: <a href="http://commons.apache.org/math/apidocs/org/apache/commons/math/analysis/UnivariateRealFunction.html">
+ UnivariateRealFunction</a>. In order to comply with this
+ requirement, we wrap the derivative object into another
+ object adapting the interface and pass this wrapped
+ derivative to the solver:
+ </p>
+
+ <source>
+ UnivariateRealFunction wrappedDerivative = new UnivariateRealFunction() {
+ public double value(double x) {
+ DifferentialPair t = DifferentialPair.newVariable(x);
+ return derivative.f(t).getFirstDerivative();
+ }
+ };
+ UnivariateRealSolver solver = new BrentSolver(wrappedDerivative);
+ double tMax = solver.solve(0.5, 1.5);
+ double yMax = derivative.f(DifferentialPair.newVariable(tMax)).getValue();
+ System.out.println("max value = " + yMax + ", at t = " + tMax +
+ " (" + solver.getIterationCount() + " iterations)");
+ </source>
+
+ <p>
+ We get the following result:
+ </p>
+
+ <source>
+ max value = 2.1097470218140537, at t = 0.8987751653383649 (7 iterations)
+ </source>
+
+ </subsection>
+
+ <subsection name="Discussion">
+ <p>
+ This example shows that Nabla creates an object that
+ computes both the value and the derivative of a function,
+ given only an instance of a class that computes the
+ primitive function. Despite we had the source code available
+ in this case, it was not used: transformation is done at
+ runtime using only an instance of the primitive function. We
+ can also observe that there is no configuration at all: the
+ automatic differentiator is built using a noargument
+ constructor and the differentiation method has only the
+ primitive object as a parameter.
+ </p>
+
+ <p>
+ We could also use the derivative object to compute both the
+ value and the first derivative of our function at any point:
+ </p>
+
+ <source>
+ for (double t = 0.0; t < 1.0; t += 0.01) {
+ DifferentialPair y = derivative.f(DifferentialPair.newVariable(t));
+ System.out.println(t + " " + y.getValue() + " " + y.getFirstDerivative());
+ }
+ </source>
+ </subsection>
</section>
 <section name="How ?">

 <p>
 The previous example shows that Nabla creates an object that
 computes both the value and the derivative of a function, given
 only an instance of a class that computes the base function. How
 does it do that?
 </p>
+ <section name="Limitations">
<p>
 Basically, Nabla works by analyzing the bytecode of the original
 object, transforming the arithmetic operations along all
 possible execution passes using the differentiation, generating
 a new class on the fly with the transformed bytecode, and
 instantiating it to generate the derivative object. This works
 for any pure Java function, or more generally for any program
 using the Java platform as its execution environment.
+ Basically, Nabla works by analyzing the bytecode of the
+ original object, transforming the arithmetic operations along
+ all possible execution passes using the differentiation,
+ generating a new class on the fly with the transformed
+ bytecode, and instantiating it to generate the derivative
+ object. This works for any pure Java function, or more
+ generally for any program using the Java platform as its
+ execution environment.
</p>
<p>
 The main drawback is that functions that call native code cannot
 be handled. For these functions, a work around is provided that
 uses finite differences (with 2, 4, 6 or 8 points schemes).
+ The main drawback is that functions that call native code
+ cannot be handled. For these functions, a fallback method is
+ provided that uses finite differences (with 2, 4, 6 or 8
+ points schemes). This fallback method does not have the same
+ advantages as the previous one: it needs configuration (number
+ of points and step size), it is not exact, it is more
+ computing intensive and it cannot be used too close to domain
+ boundaries.
</p>
</section>
Modified: commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml
URL: http://svn.apache.org/viewvc/commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml?rev=653926&r1=653925&r2=653926&view=diff
==============================================================================
 commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml (original)
+++ commons/sandbox/nabla/trunk/src/site/xdoc/internals.xml Tue May 6 14:15:38 2008
@@ 25,132 +25,224 @@
<body>
<section name="Principles">
 <subsection name="Symbolic differentiation">
+ <subsection name="Differentiation at bytecode instructions level">
<p>
 Nabla computes the derivatives using symbolic differentiation. Considering
 a typical call to a univariate function:
 <pre><code>double r = f(t);</code></pre>
 Nabla tracks the data flow that leads from the <code>t</code> parameter to the
 <code>r</code> result to understand how this result is computed. All the
 operations leading from <code>t</code> to <code>r</code> belong to a small set
 of instructions corresponding to the capabilities of the virtual machine. This
 set contains basic arithmetic operations (addition, subtraction ...), conversion
 operations (double to int, long to double ...), storage instructions (local
 variables, functions parameters, instance or class fields ...) and calls to
 elementary functions defined in the <code>Math</code> and <code>StrictMath</code>
 classes. There is really nothing more!
 </p>
 <p>
 For each one of these basic computer instruction, we know how to map it to a
 mathematical equation and we can combine this equation with its derivative to
 form a pair of equations we will use later. For example, a <code>DADD</code>
 bytecode instruction corresponds to the addition of two real numbers and
 produces a third number which is their sum. So we map the instruction to the
 equation:
+ Nabla computes the derivatives applying the classical
+ differentiation rules at bytecode level. When an instance
+ of a class implementing <code>UnivariateDifferentiable</code>
+ is passed to its <code>differentiate</code> method, Nabla
+ tracks the mathematical operations flow that leads from the
+ <code>t</code> parameter to the return value of the
+ function. At the bytecode instructions level, the operations
+ are elementary ones. Each elementary operation is then
+ changed to compute both a value and a derivative. Nothing is
+ changed to the control flow instructions (loops, branches,
+ operations scheduling).
+ </p>
+
+ <p>
+ All the changed operations belong to a small subset of the
+ virtual machine instructions. This set contains basic arithmetic
+ operations (addition, subtraction ...), conversion operations
+ (double to int, long to double ...), storage instructions (local
+ variables, functions parameters, instance or class fields ...)
+ and calls to elementary functions defined in the <code>Math</code>
+ and <code>StrictMath</code> classes. There is really nothing more!
+ For each one of these basic bytecode instructions, we know how to
+ map it to a mathematical equation and we can combine this equation
+ with its derivative to form a pair of equations we will use later.
+ For example, a <code>DADD</code> bytecode instruction corresponds
+ to the addition of two real numbers and produces a third number
+ which is their sum. So we map the instruction to the equation:
<pre><code>c=a+b</code></pre>
and we combine this with its derivative to form the pair:
<pre><code>(c=a+b, c'=a'+b')</code></pre>
 In this example, we have simply used the linearity property of differentiation
 which implies that the derivative of a sum is the sum of the derivatives.
 Similar rules exist for all arithmetic instructions, and the derivative of all
 basic functions in the <code>Math</code> and <code>StrictMath</code> is known.
 The complete rules set is described in the <a
 href="#Differentiation rules">Differentiation rules</a> section below.
 </p>
 <p>
 So the original computation path from <code>t</code> to <code>r</code> can be
 expanded by conceptually replacing all single equations that constitute the
 code by pairs of equations. The first element of each pair is a simple copy
 of the original equation. The original computation path was fed by a
 single double value (the function parameter <code>t</code>), but the expanded
 computation path needs a pair of values, one for each element in the equations
 pair. The first element of the pair of input values will be the value of the
 parameter, and the second will be the <em>derivative of the parameter with respect
 to the free variable</em>. This means that in our case, if we want to compute the
 derivative with respect to <code>t</code>, the second element will be the
 derivative of <code>t</code> with respect to itself, which is simply the constant 1.
 In this case, we need to feed the computation path with the pair <code>(t, 1)</code>.
 </p>
 <p>
 Without changing anything to the analysis or to the expansion of the computation path,
 we can also handle the case where <code>t</code> is not an independent variable but is
 itself a function of another free variable, i.e. a case where:
 <pre><code>t = g(x)</code>, with <code>dt/dx = g'(x)</code></pre>
 In this case, we would feed the computation path with the pair <code>(g(x), g'(x))</code>
 instead of the pair <code>(t, 1)</code>. This allows to handle functions composition,
 including resursive calls.
+ In this example, we have simply used the linearity property of
+ differentiation which implies that the derivative of a sum is the
+ sum of the derivatives. Similar rules exist for all arithmetic
+ instructions, and the derivative of all basic functions in the
+ <code>Math</code> and <code>StrictMath</code> is known. The
+ complete rules set is described in the <a
+ href="#Differentiation_rules">Differentiation rules</a> section
+ below.
+ </p>
+
+ <p>
+ Lets consider a more extensive example:
</p>
+
+ <source>
+ public class Linear implements UnivariateDifferentiable {
+ public double f(double t) {
+ double result = 1;
+ for (int i = 0; i < 3; ++i) {
+ result = result * (2 * t + 1);
+ }
+ return result;
+ }
+ }
+ </source>
+
+ <p>
+ In this example, the only things that need to be changed for
+ differentiating the <code>f</code> method are
+ the <code>t</code> parameter and the <code>result</code>
+ local variable which must be adapted to hold the value and
+ the derivative, the two multiplications and the addition. In
+ order to generate the new function, Nabla will convert these
+ elements one at a time, starting from the parameter change
+ and propagating the change using a simple data flow
+ analysis. There is no need to analyze the global structure
+ of the code, and no need to change anything in the loop
+ handling instructions above. The method generated for the
+ example above will be roughly similar to the one that would
+ result from compiling the code below (except that Nabla
+ generates bytecode directly, it does not use source at all):
+ </p>
+
+ <source>
+ public DifferentialPair f(DifferentialPair t) {
+
+ // source roughly equivalent to code generated at method entry
+ double t0 = t.getValue();
+ double t1 = t.getFirstDerivative();
+
+ // source roughly equivalent to conversion of the result affectation
+ double result0 = 1;
+ double result1 = 0;
+
+ // this loop handling code is not changed at all
+ for (int i = 0; i < 3; ++i) {
+
+ // source roughly equivalent to conversion of "2 * ..."
+ double tmpA0 = 2 * t0;
+ double tmpA1 = 2 * t1;
+
+ // source roughly equivalent to conversion of "... + 1"
+ tmpA0 += 1;
+
+ // source roughly equivalent to conversion of "result * ..."
+ double tmpB0 = result0 * tmpA0;
+ double tmpB1 = result0 * tmpA1 + result1 * tmpA0;
+
+ // source roughly equivalent to conversion of "result = ..."
+ result0 = tmpB0;
+ result1 = tmpB1;
+
+ }
+
+ // source equivalent to code generated at method exit
+ return new DifferentialPair(result0, result1);
+
+ }
+ </source>
+
+ <p>
+ This example shows that the instructions conversions have local
+ scope. No global tree representation of the method is needed
+ at all.
+ </p>
+
+ </subsection>
+
+ <subsection name="No intermediate DifferentialPair instances">
+
+ <p>
+ Another thing that is shown in the previous example is that
+ <code>DifferentialPair</code> instances appear only at method
+ entry and exit. They are not used internally for elementary
+ conversions, only pairs of local variables (or operand stack
+ cells as we will see later) are used.
+ </p>
+
+ <p>
+ The only methods from the <code>DifferentialPair</code> class that
+ are used by the automatic differentiator are the two arguments
+ constructor, the <code>getValue</code> method and the
+ <code>getFirstDerivative</code> method.
+ </p>
+
+ <p>
+ For differentiable functions that call other differentiable functions,
+ additional instances of <code>DifferentialPair</code> will be used.
+ Some will be built in the caller to pass parameters to the callee, and
+ others will be build in the callee to return results to the caller.
+ This is <em>not</em> the case for calls to the elementary mathematical
+ functions defined in the <code>Math</code> and <code>StrictMath</code>
+ classes. For the known functions the derivative computation are inlined.
+ For example a call to the <code>Math.cos</code> function will be inlined
+ as a call to <code>Math.cos</code>, a call to <code>Math.sin</code>
+ and some intermediate arithmetic operations.
+ </p>
+
+ </subsection>
+
+ <subsection name="Data flow analysis">
+
+ <p>
+ As shown in the example above, the original method bytecode
+ contains both immutable parts that must be preserved and
+ parts belonging to what we will call the <em>computation path</em>
+ from the <code>t</code> parameter to the result that must
+ be differentiated. The instructions that must be converted are
+ identified by a data flow analysis seeded with the <code>t</code>
+ parameter, which is changed from a primitive <code>double</code>
+ in the original method to a <code>DifferentialPair</code> in the
+ generated derivative, as explained in the <a
+ href="usage.html#Differential_pairs">Differential pairs</a> section
+ of the usage documentation.
+ </p>
+
+ <p><strong>
+ TODO: explain analysis by "tainted" data and types propagation.
+ </strong></p>
+
</subsection>
<subsection name="Validity">
+
<p>
What is the validity of this approach?
</p>
+
<p>
 For straightforward smooth functions, the expanded code really computes both the
 value of the equation and its exact derivative. This is a simple application of
 the differentiation rules. So the accuracy of the derivative will be in par with
 the accuracy of the initial function. If the initial function is a good model
 of a physical process, the derivative will be a good evaluation of its evolution.
 If the initial function is only an approximation, the derivative will be an
 approximation too, but <em>an approximation that is consistent with the initial
 function up to computer accuracy</em>.
 </p>
 <p>
 As soon as the initial function is not smooth, then some design choices are
 involved which have an impact on validity. These choices have been made in
 such a way that in some sense, the result is still as valid, as accurate and as
 consistent with the initial function as for smooth functions.
 </p>
 <p>
 First of all, what are nonsmooth functions in our model, which is based on the
 operations and functions available in the Java virtual machine? These are either
 functions that involve calls to nonsmooth functions of the <code>Math</code> and
 <code>StrictMath</code> classes near their singularity points (for example
 <code>Math.abs</code>, <code>Math.sqrt</code> or <code>Math.log</code> near zero),
 or functions for which the computation path includes conditional branches
 involving computed double parameters (for example convergence loops or <code>if</code>,
 <code>then</code>, <code>else</code> statements). This does <em>neither</em> include
 functions that use only unconditional jumps <em>nor</em> loops with a number of iterations
 not related to a computed double parameter. Such code could theoretically be
 reorganized and loops unrolled to produce a (perhaps huge) straightforward smooth
 computation path.
 </p>
 <p>
 For singularities corresponding to domain definition boundaries (like
 <code>Math.sqrt</code> and <code>Math.log</code> which cannot be computed for
 negative parameters), the theoretical derivative is defined only on the side of the
 singularity where the function itself is defined. The value of this halfderivative
 is the limit value of the derivative when approaching the singularity. Since for these
 functions we use the expression of the derivative that is valid where the function
 is valid, our computation is consistent with the theoretical mathematical definition.
 For example in both the <code>Math.sqrt</code> and <code>Math.log</code>, the
 derivative is infinite at zero, with the proper sign according to the sign of the
 input derivative.
 </p>
 <p>
 For singularities not related to domain definition boundaries (like
 <code>Math.abs</code> and conditional branches), the theoretical derivative is not
 defined as a single value, but as a pair of left and a right halfderivatives, one for
 each side of the singularity. Since there is little support in the IEEE754 standard
 to distinguish the left and right hand side of a single value (except for zero, since
 0 and +0 both exist), we have decided to adopt a simplified approach. These cases are
 implemented by simple conditional branches (we added explicitly such a conditional in the
 <code>Math.abs</code> case). Nabla then simply computes the value of the smooth
 derivative on the branch of the computation path that is selected at run time, depending
 on the values of the input parameters. This choice allows to preserve the property of
 having a derivative that is always consistent with the associated value, and it is a simple
 arbitrary choice of one of the two possibilities that correspond to the mathematical result,
 which by itself does not choose between them.
+ For straightforward smooth functions, the expanded code
+ really computes both the value of the equation and its exact
+ derivative. This is a simple application of the
+ differentiation rules. So the accuracy of the derivative
+ will be in par with the accuracy of the initial function. If
+ the initial function is a good model of a physical process,
+ the derivative will be a good evaluation of its evolution.
+ If the initial function is only an approximation of a real
+ physical model, then the derivative will be an approximation
+ too, but <em>an approximation that is consistent with the
+ initial function up to computer accuracy</em>.
</p>
+
+ <p>
+ If the initial function is not smooth, the singular points
+ must be analyzed specially. Some design choices are involved
+ which have an impact on validity. These choices have been
+ made in such a way that in some sense, the result is still
+ as valid, as accurate and as consistent with the initial
+ function as for smooth functions. This point is explained in
+ details in the section
+ about <a href="singularities.html">singularities</a>.
+ </p>
+
</subsection>
<subsection name="Virtual machine execution model">
 <subsubsection name="Frame">
 </subsubsection>

 <subsubsection name="Bytecode instructions">
 </subsubsection>
+ <p><strong>
+ TODO: explain local variables and stack cells.
+ </strong></p>
+
+ <p><strong>
+ TODO: explain the various subsets using the same categories
+ as in the first section above (basic arithmetic operations,
+ conversion operations, storage instructions and calls to elementary
+ functions).
+ </strong></p>
</subsection>
</section>
@@ 158,27 +250,34 @@
<section name="Implementation">
<subsection name="Differential pairs">
+ <p><strong>
+ TODO: explain that the various methods provided by DifferentialPair
+ are convenience methods for end users. They can be used to perform
+ some additional transforms on already derived code. BTW, is this feature
+ really useful ?
+ </strong></p>
</subsection>
<subsection name="Bytecode transforms">
+ <p><strong>
+ TODO
+ </strong></p>
</subsection>
<subsection name="Differentiation rules">
+ <p><strong>
+ TODO: put <em>all</em> rules in a table for reference.
+ </strong></p>
+ </subsubsection>
</subsection>
<subsection name="Complete differentiation example">
+ <p><strong>
+ TODO: step by step analysis of the initial example.
+ </strong></p>
</subsection>
</section>
 <section name="Issues">

 <subsection name="Singularities handling">
 </subsection>

 <subsection name="Data flow and control flow analysis">
 </subsection>

 </section>
</body>
</document>
Modified: commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml
URL: http://svn.apache.org/viewvc/commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml?rev=653926&r1=653925&r2=653926&view=diff
==============================================================================
 commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml (original)
+++ commons/sandbox/nabla/trunk/src/site/xdoc/usage.xml Tue May 6 14:15:38 2008
@@ 23,93 +23,272 @@
</properties>
<body>
 <section name="Public API">
 <p>
 Nabla public interface is very small, it is composed of only three
 interfaces and two classes.
 <dl>
 <dt>UnivariateDifferentiable</dt>
 <dd>is the interface representing the mathematical function that should
 be differentiated. The userdefined class are provided to Nabla as
 classes implementing this interface</dd>
 <dt>UnivariateDerived</dt>
 <dd>is the interface representing the derived function created by
 Nabla. It is the result of the differentiation.</dd>
 <dt>UnivariateDifferentiator</dt>
 <dd>is the interface implemented by the various differentiators provided
 by Nabla.</dd>
 <dt>DifferentialPair</dt>
 <dd>is a simple container holding both a value and a derivative. It can be
 considered simply as an <em>enhanced double</em> that is used both as
 the type of input parameters and return value of derived functions.</dd>
 <dt>AutomaticDifferentiator</dt>
 <dd>is the main implementation of the UnivariateDifferentiator interface.
 It performs differentiation by bytecode analysis and generation, using
 the exact differentiation rules in order to create functions that
 compute exact differentials.</dd>
+ <section name="Basic usage">
+ <subsection name="Public API">
+ <p>
+ Nabla public interface is very small, it is composed of only
+ three interfaces and two classes.
+ <dl>
+ <dt><a href="apidocs/org/apache/commons/nabla/core/UnivariateDifferentiable.html">UnivariateDifferentiable</a></dt>
+ <dd>is the interface representing the mathematical
+ function that should be differentiated. The userdefined
+ class are provided to Nabla as classes implementing this
+ interface</dd>
+ <dt><a href="apidocs/org/apache/commons/nabla/core/UnivariateDerivative.html">UnivariateDerivative</a></dt>
+ <dd>is the interface representing the differentiated
+ function created by Nabla.</dd>
+ <dt><a href="apidocs/org/apache/commons/nabla/core/UnivariateDifferentiator.html">UnivariateDifferentiator</a></dt>
+ <dd>is the interface implemented by the various
+ differentiators provided by Nabla.</dd>
+ <dt><a href="apidocs/org/apache/commons/nabla/core/DifferentialPair.html">DifferentialPair</a></dt>
+ <dd>is a simple container holding both a value and a
+ derivative. It can be considered simply as an <em>enhanced
+ double</em> that is used both as the type of input
+ parameters and return value of differentiated
+ functions.</dd>
+ <dt><a href="apidocs/org/apache/commons/nabla/automatic/AutomaticDifferentiator.html">AutomaticDifferentiator</a></dt>
+ <dd>is the main implementation of the
+ UnivariateDifferentiator interface. It performs
+ differentiation by bytecode analysis and generation, using
+ the exact differentiation rules in order to create functions
+ that compute exact differentials.</dd>
</dl>
</p>
<img src="images/publicAPI.png" alt="UML class diagram of Nabla public API" />
+ </subsection>
 <p>
 In order to use differentiate a function using Nabla, the function must be
 provided as an implementation of the UnivariateDifferentiable interface and
 passed as the single parameter of the <code>differentiate</code> method of an
 AutomaticDifferentiator instance. If the existing class does not already
 implements the UnivariateDifferentiable interface, it has to be wrapped when
 provided to the differentiator.
 </p>

 <p>As an example, consider the following problem. We have a <code>model</code>
 variable which is an instance of a class with a method <code>evaluate</code>:
 <pre><code>double evaluate(double first, double second)</code></pre>
 and we want to compute its partial derivative with respect to the second
 parameter, when the first parameter value is 2.5 and the second parameter
 ranges from 1 to +1. Here is a way to do that:
 </p>
+ <subsection name="Differentiating userdefined functions">
+ <p>
+ In order to differentiate a function using Nabla, a
+ userdefined function must be provided as an implementation
+ of the UnivariateDifferentiable interface. It is passed as
+ the single parameter to the <code>differentiate</code>
+ method of an AutomaticDifferentiator instance. If the
+ existing class does not already implements the
+ UnivariateDifferentiable interface, it has to be wrapped
+ when provided to the differentiator.
+ </p>
+
+ <p>
+ As an example, consider the following problem. We have
+ a <code>model</code> variable which is an instance of a
+ class with a method <code>evaluate</code>:
+ <pre><code>double evaluate(double first, double second)</code></pre>
+ We want to compute its partial derivative with respect to
+ the second parameter, when the first parameter value is 2.5
+ and the second parameter ranges from 1 to +1. Here is a way
+ to do that:
+ </p>
+
+ <source>
+ UnivariateDerivative derivative =
+ new AutomaticDifferentiator().differentiate(new UnivariateDifferentiable() {
+ public double f(double t) {
+ return model.evaluate(2.5, t);
+ }
+ });
+ </source>
+ </subsection>
+
+ <subsection name="Differential pairs">
+
+ <p>
+ The derivative instance created by Nabla provides a
+ method <code>f</code> which computes both the value of the
+ primitive function (as the initial instance does) and the
+ value of its first derivative with respect to its input
+ parameter. This method has the following signature:
+ <pre><code>DifferentialPair f(DifferentialPair t)</code></pre>
+ </p>
+
+ <p>
+ It is important to note this method does not
+ use <code>double</code> as its input parameter
+ but <code>DifferentialPair</code>. This allows to handle both
+ the basic usage when one wants to compute the derivative of
+ the <code>f</code> function when its parameter is the
+ independent variable, but also more advanced cases when the
+ parameter <code>t</code> is not an independent variable.
+ </p>
+
+ <p>Let's say <code>t</code> is computed by another
+ function <code>g</code> from a variable <code>x</code>
+ (<code>t = g(x)</code>). The signature of the derivative
+ function produced by Nabla allows to chain calls and compute
+ the composite derivative. We compute
+ first <code>dt/dx</code> from the derivative
+ function <code>g'(x)</code>, and then pass the resulting
+ differential pair to the derivative
+ function <code>f'(t)</code> in order to get the
+ result <code>dy/dx</code>.
+ </p>
+
+ <source>
+ DifferentialPair dTdX = gDerivative(DifferentialPair.newVariable(2.0));
+ DifferentialPair dYdX = fDerivative(dTdX);
+ </source>
+
+ <p>
+ Simple derivatives are computed by passing independent
+ variables to the derivative functions. An independent
+ variable is represented by a <code>DifferentialPair</code>
+ with a first derivative set to 1. Composite derivatives are
+ computed by passing dependent variables to the derivative
+ functions. A dependent variable is represented by
+ a <code>DifferentialPair</code> which has been computed by a
+ previous derivative function and which may have any value as
+ its first derivative. A constant may be represented by
+ a <code>DifferentialPair</code> with a first derivative set
+ to 0.
+ </p>
+
+ <p>
+ The <code>DifferentialPair</code> class provides a general
+ constructor to build an instance from its value and
+ derivative. For convenience, it also provides factory methods
+ to build independent variables and constants:
+ </p>
+
+ <source>
+ // value = 1.0, first derivative = 2.0
+ DifferentialPair aPair = new DifferentialPair(1.0, 2.0);
+
+ // this is equivalent to new DifferentialPair(3.5, 1.0);
+ DifferentialPair variable = DifferentialPair.newVariable(3.5);
+
+ // this is equivalent to new DifferentialPair(Math.PI, 0.0)
+ DifferentialPair constant = DifferentialPair.newConstant(Math.PI);
+ </source>
 <source>
 final double constantFirst = 2.5;
 UnivariateDerived derived =
 new AutomaticDifferentiator().differentiate(new UnivariateDifferentiable() {
 public double f(double t) {
 return model.evaluate(constantFirst, t);
 }
 });
 for (double second = 1; second <= +1; second += 0.01) {
 DifferentialPair t = DifferentialPair.newVariable(second);
 System.out.println(second + " " + derived.f(t).getFirstDerivative());
 }
 </source>
+ </subsection>
</section>
 <section name="Updating the base and derived objects">
+ <section name="Advanced use">
+ <subsection name="Updating the base and differentiated objects">
 <p>
 One important thing to note is a consequence of the fact that the
 <code>differentiate</code> method returns a new object when called.
 This implies that we end up with two different instances of two
 different classes that compute roughly similar things: the original
 instance and the newly created object. If the implementation of the
 <code>f</code> method does use some attribute of the original class,
 then the class of the newly created object should also provide a way
 to get this value.
 </p>
+ <p>
+ One important thing to note is a consequence of the fact that the
+ <code>differentiate</code> method returns a new object when
+ called. This implies that we end up with two different
+ instances of two different classes that compute roughly
+ similar things: the original instance and the newly created
+ object. If the implementation of the <code>f</code> method
+ does use some attribute of the original class, then the class
+ of the newly created object should also provide a way to get
+ this value.
+ </p>
+
+ <p>
+ An important design choice in Nabla is that the newly
+ created instance does <em>not</em> copy the state of the
+ original object at derivation time, but instead is
+ permanently and tightly linked to this original instance and
+ uses it to get the values it needs when it needs them (even
+ if they are stored in private attributes). A direct
+ implication is that if the state of the original object is
+ changed <em>after</em> differentiation, all subsequent calls
+ to the <code>f</code> method of the already created
+ differentiated instance will reflect these changes in their
+ behavior. There is no need to bother about updating the
+ differentiated instance, it is already uptodate.
+ </p>
+
+ <p>
+ As an example, consider again the problem above, where we
+ wanted the derivative of a model with respect to its second
+ parameter. Now we want to compute the same derivative as
+ previously but we also want to be able to change the value
+ of the first parameter, instead of sticking to the value
+ 2.5. Here is a way to do this:
+ </p>
+
+ <source>
+ public class SetableFirstParameterModel implements UnivariateDifferentiable {
+
+ private Model model;
+ private double firstParameter;
+
+ public SetableFirstParameterModel(Model model, double firstParameter) {
+ this.model = model;
+ this.firstParameter = firstParameter;
+ }
+
+ public void setFirstParameter(double firstParameter) {
+ this.firstParameter = firstParameter;
+ }
+
+ public double f(double t) {
+ return model.evaluate(firstParameter, t);
+ }
+
+ }
+ </source>
+
+ <p>
+ When we build the derivative of an instance of this class,
+ this derivative will keep a reference to its primitive in
+ order to access the <code>firstParameter</code> private
+ field. If this field is changed on the primitive instance by
+ calling the <code>setFirstParameter</code> method, the
+ derivative will see the change immediately.
+ </p>
+
+ <source>
+ SetableFirstParameterModel setable = new SetableFirstParameterModel(model, 2.5);
+ UnivariateDerivative derivative = new AutomaticDifferentiator().differentiate(setable);
+ DifferentialPair t = DifferentialPair.newVariable(2.0);
+
+ // derivative with respect to second parameter when first parameter equals 2.5
+ double der25 = derivative.f(t).getFirstDerivative();
+
+ // derivative with respect to second parameter when first parameter equals 3.0
+ setable.setFirstParameter(3.0);
+ double der30 = derivative.f(t).getFirstDerivative();
+ </source>
+
+ </subsection>
+
+ <subsection name="Functions calling native code">
+
+ <p>
+ Since the automatic differentiator can analyze only bytecode,
+ functions calling native code cannot be handled this way by
+ Nabla. There is a fallback procedure using finite differences.
+ The <code>differences</code> package provides several classes
+ that implement the <code>UnivariateDifferentiator</code> interface:
+ <code><a href="apidocs/org/apache/commons/nabla/differences/TwoPointsScheme.html">TwoPointsScheme</a></code>,
+ <code><a href="apidocs/org/apache/commons/nabla/differences/FourPointsScheme.html">FourPointsScheme</a></code>,
+ <code><a href="apidocs/org/apache/commons/nabla/differences/SixPointsScheme.html">SixPointsScheme</a></code> and
+ <code><a href="apidocs/org/apache/commons/nabla/differences/EightPointsScheme.html">EightPointsScheme</a></code>.
+ </p>
+
+ <p>
+ These classes need a step size at construction time. For each call
+ to the derivative instance <code>f</code> method, they will call the
+ <code>f</code> of the primitive instance <code>n+1</code> times where
+ <code>n</code> is the number of points of the method. One evaluation
+ is at the location defined by the parameter and the <code>n</code>
+ other evaluations are regularly distributed around it. So the two points
+ method will evaluate the primitive function 3 times at <code>th</code>,
+ <code>t</code> and <code>t+h</code> while the eight points method will
+ evaluate it 9 times at <code>t4h</code>, <code>t3h</code> ...
+ <code>t+4h</code> where <code>h</code> is the step size.
+ </p>
+
+ <p>
+ The step size and the number of points must be chosen with care as they
+ influence both the accuracy of the result (which is only an approximation)
+ and the computational cost. Small step size improve theoretical accuracy
+ up to the point where numerical cancellations due to the finite precision
+ of double numbers exceed the theoretical error due to finite differences
+ modeling. Large number of points improve the accuracy but imply a large
+ number of functions evaluation which can become prohibitive. There is no
+ best choice that fits all needs, the right choice is problemdependent.
+ </p>
 <p>
 An important design choice in Nabla is that the newly created instance
 does <em>not</em> copy the state of the original object at derivation
 time, but instead is permanently and tightly linked to this original
 instance and uses it to get the values it needs when it needs them
 (even if it is stored in a private attribute). A direct implication is
 that if the state of the original object is changed <em>after</em>
 differentiation, all subsequent calls to the <code>f</code> method of
 the already created differentiated instance will reflect these changes
 in their behavior. There is no need to bother about updating the
 differentiated instance, it is already uptodate.
 </p>
+ </subsection>
</section>
