View on GitHub

(Really) Fast Function Generation

A C++-library for the generation of complex functions, including first three derivatives and different variable types.

Download this project as a .zip file Download this project as a tar.gz file

Build Status Coverage Status

RFFGen

RFFGen is a C++-library for the generation of complex functions. These can depend on an arbitrary number of variables of different types. Besides the function value the first three, possibly mixed, directional derivatives are provided. The evaluation of function values and derivatives is (almost) as efficient as optimized manual implementations.
Possibly modified, principal and mixed invariants, the basic blocks of complex physical models, are provided, as well as some isotropic and anisotropic models from biomechanics.

The implementation is similar to forward automatic differentiation (AD) with expression templates. However, here a different perspective is taken.


Overview


Installation

This library is header-only and no installation is required. However, make sure you use a recent compiler such as gcc-x, with x>=4.9.0, with -std=c++1y.

To run the available tests, you can use cmake:


Examples

Usage is illustrated using three different examples. Further examples can be found in RFFGen/Tests and RFFGen/Examples.

A simple example

We begin with the function

equation

with

equation

and

equation

#include "RFFGen.hh"

auto generateFunction()
{
  using namespace RFFGen::CMath;

  auto h = Pow<3>() + Sin();
  auto g = Sqrt();

  return RFFGen::finalize_scalar( h(g) );
}

Usage is as follows:

auto f = generateFunction();

f.update(1.) //  change function argument

auto value = f(); // or f.d0()
auto firstDerivative  = f.d1();
auto secondDerivative = f.d2();
auto thirdDerivative  = f.d3();

The call to finalize_scalar is necessary to simplify usage of the generated function. Without usage is slightly less convenient as illustrated in the other examples below.

An example from biomechanics

Now let us generate the function

equation

which constitutes a model for adipose tissue. In this context the invariants depend on the left Cauchy-Green strain tensor. The part (\iota_1-3) is represented by the ShiftedFirstPrincipalInvariant. The implementation of this model is

#include "RFFGen.hh"

template <class Mat>
auto generateAdiposeTissue(double cCells, double k1, double k2,
             double kappa, const Matrix& M, const Matrix& F)
{
  using namespace RFFGen::LinearAlgebra;
  using RFFGen::CMath::exp;
  using RFFGen::finalize;

  auto i1 = FirstPrincipalInvariant<Matrix>();
  auto si1 = ShiftedFirstPrincipalInvariant<Matrix>();
  auto i4 = FirstMixedInvariant<Matrix>(F,M);

  auto aniso = kappa*i1 + (1-3*kappa) * i4 - 1;
  auto materialLaw = cCells*si1 + (k1/k2) * ( exp(k2*(aniso^2)) - 1 );

 return finalize( materialLaw( LeftCauchyGreenStrainTensor<Matrix>(F) ) );
}

Usage is as follows (assuming that the material parameters, the structural tensor M, the deformation gradient F and perturbations dF0, dF1, dF2 of the latter are given):

auto f = generateAdiposeTissue(cCells,k1,k2,kappa,M,F);

f.update(F) //  change function argument

double value = f(); // or f.d0()
double firstDerivative  = f.d1(dF0);
double secondDerivative = f.d2(dF0,dF1);
double thirdDerivative  = f.d3(dF0,dF1,dF2);

Observe that the derivatives are DIRECTIONAL derivatives. This approach admits to work with any (reasonable) input types. In the first example, where the input variable was a scalar, the call to finalize_scalar did generate suitable default values for the directions. Here this is not admissible.

Take care to not use functions such as 'exp' with built-in arithmetic types, since in this case the corresponding functions from are called and the resulting value is treated as constant.

Also note that the operator^(int k) is only defined for k=2. For scalar valued functions you may use Pow<dividend,divisor> to represent rational powers.

An example with two variables

Eventually we consider a function with two variables, a scalar variable x and a matrix valued variable F.

equation

#include "RFFGen.hh"

template <class Mat>
auto generateFunction()
{
  using RFFGen::CMath::sqrt;
  using RFFGen::LinearAlgebra::trace;
  using RFFGen::Variable;

  auto x = Variable<double,0>();
  auto F = Variable<Mat,1>();

  return RFFGen::finalize( sqrt(x)*trace(F) );
}

When computing derivatives we now have to specify the variable ids for which the derivative will be computed. The same holds for setting the function values.

auto f = generateFunction<Mat>();

f.template update<0>(x);
f.template update<1>(F);

double value       = f(); // or f.d0()
double df_dx       = f.template d1<0>(dx1);
double df_dF       = f.template d1<1>(dF)
double ddf_dFdx    = f.template d2<1,0>(dF,dx1);
double dddf_dxdxdF = f.template d3<0,0,1>(dx1,dx2,dF);

Requirements on arithmetic types

One idea that was followed in the development of this library was to make it compatible with common matrix implementations. Currently you can directly use it with Armadillo, DUNE and Eigen matrices. If you use another matrix library, your own matrix or vector implementation or want to compute with something else, these must satisfy the requirements listed below (see also concepts.hh). These are statically checked with static_assert. Thus you will get meaningful feedback if your implementation lacks some required feature. However, for most reasonable matrix implementations nothing or almost nothing must be adjusted.

Requirements

  1. Copyable:
    T must be CopyConstructible and CopyAssignable.
  2. Multiplication with arithmetic types:
    Given:
    • An object t of type T.
    • An object a of an arithmetic type, such as double or int.
    At least one of the following expressions must be valid:
    • T s = a*t;
    • t *= a;
  3. Summation:
    Objects t and s of type T must be summable, i.e. at least one of the following expressions must be valid:
    • T r = s + t;
    • t += s;
  4. Multiplication (for matrix types only):
    Compatible matrices t and s must be multipliable, i.e. at least one of the following expressions must be valid:
    • auto r = t*s;
    • t*=s; (for square matrices)
    • auto r = t.rightmultiplany(s); (for DUNE matrices)
  5. Access to data:
    Access to the entries of a matrix or vector t via at least one of the following expressions:
    • auto a = t[1][2]; (for matrices) or auto a = t[1]; (for vectors).
    • auto a = t(1,2); (for matrices) or auto a = t(1); (for vectors).
  6. Access to number of rows and columns (matrices and vectors):
    • Fixed size:
      A specialization of the template classes
      template <class Matrix, class> struct NumberOfRows;
      and
      template <class Matrix, class> struct NumberOfColumns;
      must be provided. For the cases that, for some scalar type S and n,m of type unsigned or int, the employed matrix class is of the form
      • Matrix<n,m>
      • Matrix<S,n,m>
      suitable specializations are available and access to the number of rows and columns is supported without need to do anything. Similar specializations exist for vectors.
    • Dynamic size:
      Access to the number of rows and columns of an object t must be supported via at least one of the following expressions:
      • t.rows() resp. t.cols()
      • t.n_rows resp. t.n_cols

Requirements (optional)

  1. Default-constructible:
    If T is DefaultConstructible then functions that take arguments of type T also a default constructor, else the default constructor is deleted.
  2. Construction of the zero matrix:
    If you want to use the functions zero<Matrix>() and unitMatrix<Matrix>(), a specialization of
    template <class Matrix,class> struct Zero;
    must be provided. Suitable implementations exist for the cases that a zero matrix can be created by one of the following expressions:
    • auto zeroMatrix = Matrix(0.);
    • auto zeroMatrix;
      zeroMatrix.zeroes();

Comparison with automatic differentation libraries

The performance of RFFGen is demonstrated by several examples comparing it with forward mode automatic differentation of FADBAD++ and SACADO. Some other implementations were tested too. However, these were far slower and/or buggy. In particular backward mode automatic differentation is significantly slower for the considered functions.
All examples were compiled with -O2 compiler option with gcc-4.9.2. They were run on an ASUS UX32V. All examples were evaluated 10.000.000 times (function value + derivative(s)) with varying arguments.

A simple example

Consider f(x)=x^{3/2}+\sin(\sqrt{x}).

A more complex example

Consider f(x)= x ( \exp(\sqrt{x}) + 1 ) + \sin(2\exp(\sqrt{x})+1).

(here, SACADO fails)

An example with three variables

Consider f(x,y,z)= (y+z) \sqrt{x}+\sin(\sqrt{x}).


Optimization strategies

In order to get performance comparable to optimized manual implementations some optimization strategies are applied.

Inlining

Since this library is based on code generation with template meta-programming and inlining, compiler rules for inlining have a strong influence on performance. The default rules, which work quite well in general, can be improved by adjusting specific parameters. For the examples in "Examples/Biomechanics" the choices

for gcc-4.9.2 yields performance increases in the range of 2-4x. (see below)

Note that adequate choices of the above parameters may be different when using this library in a larger project. For optimal performance you may think about compiling the function definition in a separate object file with compiler parameters as proposed above.

Caching

Significant performance increases are due to caching. In contrast to lazy evaluation, here the strategy is to compute intermediate results as soon as possible. This has various impacts. First the evaluation of cached results is one of the simplest function calls and is likely to be inlined. Moreover, in many expressions, the computed intermediate results can be reused in the computation of higher derivatives. Eventually, and most important, in many cases we can significantly reduce the size of temporaries as well as the number of generated temporaries by directly performing simple computations.

Elimination of compile-time zeros

Since we do not only want efficient evaluation of function values, but also of its first three derivatives, we often encounter larger number of compile-time zeros that can not be eliminated by the compiler. Thus this task must be implemented seperately. In order to completely eliminate compile-time zeros and correctly reduce involved mathematical expressions we do not define the corresponding derivatives. In particular when working with matrices, where only few operations are of higher order, this significantly reduces the computation times of the second and third derivative.

A good example for the effects of this optimization strategy is the product f=M(F^TF), which occurs in the computation of the first mixed strain invariant. In this context M is treated as a constant and F denotes the unknown and both are matrices in \mathbb{R}^{3,3}. For simplicity we assume that we have a function C=F^T F. The following graph illustrates the mathematical operations that are performed when evaluating the second derivative of this product.

Mathematical operations required for the evaluation of the second derivative of the product rule.

Since all derivatives of M vanish only the part marked in orange will be computed. The remaining computations are discarded during compilation.


Performance for a model of muscle tissue

For functions that take matrices as arguments automatic differentation libraries can typically not be used directly. To get an idea of the performance, consider the function
\begin{align*}W(F)&=c(\exp(b(\bar\iota_1-3))-1) + A(\exp(a(\bar\iota_6-1)^2)-1) \\ &+d_0\Gamma_\mathrm{Inf}(\det(F))+d_1\Gamma_\mathrm{Com}(\det(F)) \end{align*}
where \Gamma_\mathrm{Inf}(t)=t^2 and \Gamma_\mathrm{Com}(t)=\mathrm{ln}(t), C=F^T F, structural tensor M and modified invariants \bar\iota_1(C)=\mathrm{tr}(C)\det(C)^{-1/3}, \bar\iota_6(C,M)=\mathrm{tr}(CM^2)\det(C)^{-1/3}. Again each function is called 10.000.000 times with varying arguments.

In the computations above compiler options as described in Inlining were used. Computation times only with -02 are given below.

Due to the extensive caching the call to update is more costly than the evaluation of d0 or d1. As a consequence the call to d0 is almost for free, since it only consists in copying the cached value. Large parts of the cached results can be reused, which is reflected in the small costs for the evaluation of d1 and d2. In the computation of the third derivative the computations involving the perturbations start to dominate the overall computational costs.


Documentation

Doxygen documentation can be found here.

Contact: lars.lubkoll@posteo.de