/*
 * Project: MoleCuilder
 * Description: creates and alters molecular systems
 * Copyright (C)  2010 University of Bonn. All rights reserved.
 * Please see the LICENSE file or "Copyright notice" in builder.cpp for details.
 */

/**
 * \file potentials.dox
 *
 * Created on: Nov 28, 2012
 *    Author: heber
 */

/** \page potentials Empirical Potentials and FunctionModels
 *
 * On this page we explain what is meant with the Potentials sub folder.
 *
 * First, we are based on fragmenting a molecular system, i.e. dissecting its
 * bond structure into connected subgraphs, calculating the energies of the
 * fragments (ab-initio) and summing up to a good approximation of the total
 * energy of the whole system, \sa fragmentation.
 * Second, having calculated these energies, there quickly comes up the thought
 * that one actually calculates quite similar systems all time and if one could
 * not cache results in an intelligent (i.e. interpolating) fashion ...
 *
 * That's where so-called empirical potentials come into play. They are
 * functions depending on a number of "fitted" parameters and the variable
 * distances within a molecular fragment (i.e. the bond lengths) in order to
 * give a value for the total energy without the need to solve a complex
 * ab-initio model (essentially, not solving the electronic Schrödinger equation
 * anymore).
 *
 * Empirical potentials have been thought of by fellows such as Lennard-Jones,
 * Morse, Tersoff, Stillinger and Weber, etc. And in their honor, the
 * potential form is named after its inventor. Hence, we speak e.g. of a
 * Lennard-Jones potential.
 *
 * So, what we have to do in order to cache results is the following procedure:
 * -# gather similar fragments
 * -# perform a fit procedure to obtain the parameters for the empirical
 *    potential
 * -# evaluate the potential instead of an ab-initio calculation
 *
 * The terms we use, model the classes that are implemented:
 * -# EmpiricalPotential: Contains the interface to a function that can be
 *    evaluated given a number of arguments_t, i.e. distances. Also, one might
 *    want to evaluate derivatives.
 * -# FunctionModel: Is a function that can be fitted, i.e. that has internal
 *    parameters to be set and got.
 * -# argument_t: The Argument stores not only the distance but also the index
 *    pair of the associated atoms and also their charges, to let the potential
 *    check on validity.
 * -# SerializablePotential: Eventually, one wants to store to or parse from
 *    a file all potential parameters. This functionality is encoded in this
 *    class.
 * -# HomologyGraph: "Similar" fragments in our case have to have the same bond
 *    graph. It is stored in the HomologyGraph that acts as representative
 * -# HomologyContainer: This container combines, in multimap fashion, all
 *    similar fragments with their energies together, with the HomologyGraph
 *    as their "key".
 * -# TrainingData: Here, we combine InputVector and OutputVector that contain
 *    the set of distances required for the FunctionModel (e.g. only a single
 *    distance/argument for a pair potential, three for an angle potential,
 *    etc.) and also the expected OutputVector. This in combination with the
 *    FunctionModel is the basis for the non-linear regression used for the
 *    fitting procedure.
 * -# Extractors: These set of functions yield the set of distances from a
 *    given fragment that is stored in the HomologyContainer.
 * -# FunctionApproximation: Contains the interface to the levmar package where
 *    the Levenberg-Marquardt (Newton + Trust region) algorithm is used to
 *    perform the fit.
 *
 * \section potentials-fit-potential-action What happens in FitPotentialAction.
 *
 *  First, either a potential file is parsed via PotentialDeserializer or charges
 *  and a potential type from the given options. This is used to instantiate
 *  EmpiricalPotentials via the PotentialFactory, stored within the
 *  PotentialRegistry. This is the available set of potentials (without requiring
 *  any knowledge as to the nature of the fragment employed in fitting).
 *
 *  Second, the given fragment is used to get a suitable HomologyGraph from
 *  the World's HomologyContainer. This is given to a CompoundPotential, that in
 *  turn browses through the PotentialRegistry, picking out those
 *  EmpiricalPotential's that match with a subset of the FragmentNode's of the
 *  given graph. These are stored as a list of FunctionModel's within the
 *  CompoundPotential instance. Here comes the specific fragment into play,
 *  picking a subset from the available potentials.
 *
 *  Third, we need to setup the training data. For this we need vectors of input
 *  and output data that are obtained from the HomologyContainer with the
 *  HomologyGraph as key. The output vector in our case is simply a number
 *  (although the interface allows for more). The input vector is the set of
 *  distances. In order to pre-process the input data for the specific model
 *  a filter is required in the TrainingData's constructor. The purpose of the
 *  filter is to pick out the subset of distance arguments for each model one
 *  after the other and concatenate them to a list. On evaluation of the model
 *  this concatenated list of distances is given to the model and it may easily
 *  dissect the list and hand over each contained potential its subset of
 *  arguments. See Extractors for more information.
 *
 *  Afterwards, training may commence: The goal is to find a set of parameters
 *  for the model such that it as good as possible reproduces the output vector
 *  for a given input vector. This non-linear regression is contained in the
 *  levmar package and its functionality is wrapped in the FunctionApproximation
 *  class. An instance is initialized with both the gathered training data and
 *  the model containing a list of potentials. See
 *  [FunctionApproximation-details] for more details.
 *
 *  During the fitting procedure, first the derivatives of the model is checked
 *  for consistency, then the model is initialized with a sensible guess of
 *  starting parameters, and afterwards the Levenberg-Marquardt algorithm
 *  commences that makes numerous calls to evaluate the model and its derivative
 *  to find the minimum in the L2-norm.
 *
 *  This is done more than once as high-dimensional regression is sensititive the
 *  the starting values as there are possible numerous local minima. The lowest
 *  of the found minima is taken, either via a given threshold or the best of a
 *  given number of attempts.
 *
 *  Eventually, these parameters of the best model are streamed via
 *  PotentialSerializer back into a potential file. Each EmpiricalPotential in
 *  the CompoundPotential making up the whole model is also a
 *  SerializablePotential. Hence, each in turn writes a single line with its
 *  respective subset of parameters and particle types, describing this
 *  particular fit function.
 *
 * \section potentials-function-evaluation How does the model evaluation work
 *
 *  We now come to the question of how the model and its derivative are actually
 *  evaluated. We have an input vector from the training data and we have the
 *  model with a specific set of parameters.
 *
 *  FunctionModel is just an abstract interface that is implemented by the
 *  potential functions, such as CompoundPotential, that combines multiple
 *  potentials into a single function for fitting, or PairPotential_Harmonic,
 *  that is a specific fit function for harmonic bonds.
 *
 *  The main issue with the evaluation is picking the right set of distances from
 *  ones given in the input vector and feed it to each potential contained in
 *  CompoundPotential. Note that the distances have already been prepared by
 *  the TrainingData instantiation.
 *
 *  Initially, the HomologyGraph only contains a list of configurations of a
 *  specific fragments (i.e. the position of each atom in the fragment) and an
 *  energy value. These first have to be converted into distances.
 *
 *
 * \section potentials-howto-use Howto use the potentials
 *
 *  We just give a brief run-down in terms of code on how to use the potentials.
 *  Here, we just describe what to do in order to perform the fitting. This is
 *  basically what is implemented in FragmentationFitPotentialAction.
 *
 *  \code
 *  // we need the homology container and the representative graph we want to
 *  // fit to.
 *  HomologyContainer homologies;
 *  const HomologyGraph graph = getSomeGraph(homologies);
 *  Fragment::charges_t h2o;
 *  h2o += 8,1,1;
 *  // TrainingData needs so called Extractors to get the required distances
 *  // from the stored fragment. These functions are bound via boost::bind.
 *  TrainingData AngleData(
 *      boost::bind(&Extractors::gatherDistancesFromFragment,
 *          boost::bind(&Fragment::getPositions, _1),
 *          boost::bind(&Fragment::getCharges, _1),
 *          boost::cref(h2o),
 *          _2)
 *      );
 *  // now we extract the distances and energies and store them
 *  AngleData(homologies.getHomologousGraphs(graph));
 *  // give ParticleTypes of this potential to make it unique
 *  PairPotential_Angle::ParticleTypes_t types =
 *        boost::assign::list_of<PairPotential_Angle::ParticleType_t>
 *        (8)(1)(1)
 *      ;
 *  PairPotential_Angle angle(types);
 *  // give initial parameter
 *  FunctionModel::parameters_t params(PairPotential_Angle::MAXPARAMS, 0.);
 *  // ... set some potential-specific initial parameters in params struct
 *  angle.setParameters(params);
 *
 *  // use the potential as a FunctionModel along with prepared TrainingData
 *  FunctionModel &model = angle;
 *  FunctionApproximation approximator(AngleData, model);
 *  approximator(FunctionApproximation::ParameterDerivative);
 *
 *  // obtain resulting parameters and check remaining L_2 and L_max error
 *  angleparams = model.getParameters();
 *  LOG(1, "INFO: L2sum = " << AngleData(model)
 *      << ", LMax = " << AngleData(model) << ".");
 *  \endcode
 *
 *  The evaluation of the fitted potential is then trivial, e.g.
 *  \code
 *  // constructed someplace
 *  PairPotential_Angle angle(...);
 *
 *  // evaluate
 *  FunctionModel::arguments_t args;
 *  // .. initialise args to the desired distances
 *  const double value = angle(args)[0]; // output is a vector!
 *  \endcode
 *
 * \section potentials-stability-of-fit note in stability of fit
 *
 *  As we always start from random initial parameters (within a certain sensible
 *  range at least), the non-linear fit does not always converge. Note that the
 *  random values are drawn from the defined distribution and the uniform distributionm
 *  engine is obtained from the currently set, see \ref randomnumbers. Hence, you
 *  can manipulate both in order to get different results or to set the seed such that
 *  some "randomly" drawn value always work well (e.g. for testing). 
 *  
 *  In any case, For this case the FragmentationFitPotentialAction has the option 
 *  "take-best-of" to allow for multiple fits where the best (in terms of l2 error) 
 *  is taken eventually. Furthermore, you can use the "set-threshold" option to
 *  stop restarting the fit procedure first when the L2 error has dropped below the
 *  given threshold.
 *
 * \section potentials-howto-add Howto add new potentials
 *
 *  Adding a new potential requires the following:
 *  -# Add the new modules to Potentials/Specifics
 *  -# Add a unit test on the potential in Potentials/Specifics/unittests
 *  -# Give the potential a type name and add it to PotentialTypes.def. Note
 *     that the name must not contain white space.
 *  -# Add the potential name as case to PotentialFactory such that it knows
 *     how to instantiate your new potential when requested.
 *  -# Remember to use the the RandomNumberGenerator for getting random starting
 *     values!
 *
 * PotentialTypes.def contains a boost::preprocessor sequence of all
 * potential names. PotentialFactory uses this sequence to build its enum to
 * type map and inverse which the user sees when specifying the potential to
 * fit via PotentialTypeValidator.
 *
 *
 * \date 2013-04-09
 */