Context Navigation

potentials.dox@ cae614

Visit:

Action_Thermostats Add_AtomRandomPerturbation Add_FitFragmentPartialChargesAction Add_RotateAroundBondAction Add_SelectAtomByNameAction Added_ParseSaveFragmentResults AddingActions_SaveParseParticleParameters Adding_Graph_to_ChangeBondActions Adding_MD_integration_tests Adding_ParticleName_to_Atom Adding_StructOpt_integration_tests AtomFragments Automaking_mpqc_open AutomationFragmentation_failures Candidate_v1.5.4 Candidate_v1.6.0 Candidate_v1.6.1 ChangeBugEmailaddress ChangingTestPorts ChemicalSpaceEvaluator CombiningParticlePotentialParsing Combining_Subpackages Debian_Package_split Debian_package_split_molecuildergui_only Disabling_MemDebug Docu_Python_wait EmpiricalPotential_contain_HomologyGraph EmpiricalPotential_contain_HomologyGraph_documentation Enable_parallel_make_install Enhance_userguide Enhanced_StructuralOptimization Enhanced_StructuralOptimization_continued Example_ManyWaysToTranslateAtom Exclude_Hydrogens_annealWithBondGraph FitPartialCharges_GlobalError Fix_BoundInBox_CenterInBox_MoleculeActions Fix_ChargeSampling_PBC Fix_ChronosMutex Fix_FitPartialCharges Fix_FitPotential_needs_atomicnumbers Fix_ForceAnnealing Fix_IndependentFragmentGrids Fix_ParseParticles Fix_ParseParticles_split_forward_backward_Actions Fix_PopActions Fix_QtFragmentList_sorted_selection Fix_Restrictedkeyset_FragmentMolecule Fix_StatusMsg Fix_StepWorldTime_single_argument Fix_Verbose_Codepatterns Fix_fitting_potentials Fixes ForceAnnealing_goodresults ForceAnnealing_oldresults ForceAnnealing_tocheck ForceAnnealing_with_BondGraph ForceAnnealing_with_BondGraph_continued ForceAnnealing_with_BondGraph_continued_betteresults ForceAnnealing_with_BondGraph_contraction-expansion FragmentAction_writes_AtomFragments FragmentMolecule_checks_bonddegrees GeometryObjects Gui_Fixes Gui_displays_atomic_force_velocity ImplicitCharges IndependentFragmentGrids IndependentFragmentGrids_IndividualZeroInstances IndependentFragmentGrids_IntegrationTest IndependentFragmentGrids_Sole_NN_Calculation JobMarket_RobustOnKillsSegFaults JobMarket_StableWorkerPool JobMarket_unresolvable_hostname_fix MoreRobust_FragmentAutomation ODR_violation_mpqc_open PartialCharges_OrthogonalSummation PdbParser_setsAtomName PythonUI_with_named_parameters QtGui_reactivate_TimeChanged_changes Recreated_GuiChecks Rewrite_FitPartialCharges RotateToPrincipalAxisSystem_UndoRedo SaturateAtoms_findBestMatching SaturateAtoms_singleDegree StoppableMakroAction Subpackage_CodePatterns Subpackage_JobMarket Subpackage_LinearAlgebra Subpackage_levmar Subpackage_mpqc_open Subpackage_vmg Switchable_LogView ThirdParty_MPQC_rebuilt_buildsystem TrajectoryDependenant_MaxOrder TremoloParser_IncreasedPrecision TremoloParser_MultipleTimesteps TremoloParser_setsAtomName Ubuntu_1604_changes stable

Last change on this file since cae614 was 1ba8a1, checked in by Frederik Heber <heber@…>, 11 years ago
DOCU: Extended documentation on FunctionApproximation and potential fitting.
Property mode set to `100644`
File size: 11.9 KB

Line
1	/*
2	* Project: MoleCuilder
3	* Description: creates and alters molecular systems
4	* Copyright (C) 2010 University of Bonn. All rights reserved.
5	* Please see the LICENSE file or "Copyright notice" in builder.cpp for details.
6	*/
7
8	/**
9	* \file potentials.dox
10	*
11	* Created on: Nov 28, 2012
12	* Author: heber
13	*/
14
15	/** \page potentials Empirical Potentials and FunctionModels
16	*
17	* On this page we explain what is meant with the Potentials sub folder.
18	*
19	* First, we are based on fragmenting a molecular system, i.e. dissecting its
20	* bond structure into connected subgraphs, calculating the energies of the
21	* fragments (ab-initio) and summing up to a good approximation of the total
22	* energy of the whole system, \sa fragmentation.
23	* Second, having calculated these energies, there quickly comes up the thought
24	* that one actually calculates quite similar systems all time and if one could
25	* not cache results in an intelligent (i.e. interpolating) fashion ...
26	*
27	* That's where so-called empirical potentials come into play. They are
28	* functions depending on a number of "fitted" parameters and the variable
29	* distances within a molecular fragment (i.e. the bond lengths) in order to
30	* give a value for the total energy without the need to solve a complex
31	* ab-initio model (essentially, not solving the electronic SchrÃ¶dinger equation
32	* anymore).
33	*
34	* Empirical potentials have been thought of by fellows such as Lennard-Jones,
35	* Morse, Tersoff, Stillinger and Weber, etc. And in their honor, the
36	* potential form is named after its inventor. Hence, we speak e.g. of a
37	* Lennard-Jones potential.
38	*
39	* So, what we have to do in order to cache results is the following procedure:
40	* -# gather similar fragments
41	* -# perform a fit procedure to obtain the parameters for the empirical
42	* potential
43	* -# evaluate the potential instead of an ab-initio calculation
44	*
45	* The terms we use, model the classes that are implemented:
46	* -# EmpiricalPotential: Contains the interface to a function that can be
47	* evaluated given a number of arguments_t, i.e. distances. Also, one might
48	* want to evaluate derivatives.
49	* -# FunctionModel: Is a function that can be fitted, i.e. that has internal
50	* parameters to be set and got.
51	* -# argument_t: The Argument stores not only the distance but also the index
52	* pair of the associated atoms and also their charges, to let the potential
53	* check on validity.
54	* -# SerializablePotential: Eventually, one wants to store to or parse from
55	* a file all potential parameters. This functionality is encoded in this
56	* class.
57	* -# HomologyGraph: "Similar" fragments in our case have to have the same bond
58	* graph. It is stored in the HomologyGraph that acts as representative
59	* -# HomologyContainer: This container combines, in multimap fashion, all
60	* similar fragments with their energies together, with the HomologyGraph
61	* as their "key".
62	* -# TrainingData: Here, we combine InputVector and OutputVector that contain
63	* the set of distances required for the FunctionModel (e.g. only a single
64	* distance/argument for a pair potential, three for an angle potential,
65	* etc.) and also the expected OutputVector. This in combination with the
66	* FunctionModel is the basis for the non-linear regression used for the
67	* fitting procedure.
68	* -# Extractors: These set of functions yield the set of distances from a
69	* given fragment that is stored in the HomologyContainer.
70	* -# FunctionApproximation: Contains the interface to the levmar package where
71	* the Levenberg-Marquardt (Newton + Trust region) algorithm is used to
72	* perform the fit.
73	*
74	* \section potentials-fit-potential-action What happens in FitPotentialAction.
75	*
76	* First, either a potential file is parsed via PotentialDeserializer or charges
77	* and a potential type from the given options. This is used to instantiate
78	* EmpiricalPotentials via the PotentialFactory, stored within the
79	* PotentialRegistry. This is the available set of potentials (without requiring
80	* any knowledge as to the nature of the fragment employed in fitting).
81	*
82	* Second, the given fragment is used to get a suitable HomologyGraph from
83	* the World's HomologyContainer. This is given to a CompoundPotential, that in
84	* turn browses through the PotentialRegistry, picking out those
85	* EmpiricalPotential's that match with a subset of the FragmentNode's of the
86	* given graph. These are stored as a list of FunctionModel's within the
87	* CompoundPotential instance. Here comes the specific fragment into play,
88	* picking a subset from the available potentials.
89	*
90	* Third, we need to setup the training data. For this we need vectors of input
91	* and output data that are obtained from the HomologyContainer with the
92	* HomologyGraph as key. The output vector in our case is simply a number
93	* (although the interface allows for more). The input vector is the set of
94	* distances. In order to pre-process the input data for the specific model
95	* a filter is required in the TrainingData's constructor. The purpose of the
96	* filter is to pick out the subset of distance arguments for each model one
97	* after the other and concatenate them to a list. On evaluation of the model
98	* this concatenated list of distances is given to the model and it may easily
99	* dissect the list and hand over each contained potential its subset of
100	* arguments. See Extractors for more information.
101	*
102	* Afterwards, training may commence: The goal is to find a set of parameters
103	* for the model such that it as good as possible reproduces the output vector
104	* for a given input vector. This non-linear regression is contained in the
105	* levmar package and its functionality is wrapped in the FunctionApproximation
106	* class. An instance is initialized with both the gathered training data and
107	* the model containing a list of potentials. See
108	* [FunctionApproximation-details] for more details.
109	*
110	* During the fitting procedure, first the derivatives of the model is checked
111	* for consistency, then the model is initialized with a sensible guess of
112	* starting parameters, and afterwards the Levenberg-Marquardt algorithm
113	* commences that makes numerous calls to evaluate the model and its derivative
114	* to find the minimum in the L2-norm.
115	*
116	* This is done more than once as high-dimensional regression is sensititive the
117	* the starting values as there are possible numerous local minima. The lowest
118	* of the found minima is taken, either via a given threshold or the best of a
119	* given number of attempts.
120	*
121	* Eventually, these parameters of the best model are streamed via
122	* PotentialSerializer back into a potential file. Each EmpiricalPotential in
123	* the CompoundPotential making up the whole model is also a
124	* SerializablePotential. Hence, each in turn writes a single line with its
125	* respective subset of parameters and particle types, describing this
126	* particular fit function.
127	*
128	* \section potentials-function-evaluation How does the model evaluation work
129	*
130	* We now come to the question of how the model and its derivative are actually
131	* evaluated. We have an input vector from the training data and we have the
132	* model with a specific set of parameters.
133	*
134	* FunctionModel is just an abstract interface that is implemented by the
135	* potential functions, such as CompoundPotential, that combines multiple
136	* potentials into a single function for fitting, or PairPotential_Harmonic,
137	* that is a specific fit function for harmonic bonds.
138	*
139	* The main issue with the evaluation is picking the right set of distances from
140	* ones given in the input vector and feed it to each potential contained in
141	* CompoundPotential. Note that the distances have already been prepared by
142	* the TrainingData instantiation.
143	*
144	* Initially, the HomologyGraph only contains a list of configurations of a
145	* specific fragments (i.e. the position of each atom in the fragment) and an
146	* energy value. These first have to be converted into distances.
147	*
148	*
149	* \section potentials-howto-use Howto use the potentials
150	*
151	* We just give a brief run-down in terms of code on how to use the potentials.
152	* Here, we just describe what to do in order to perform the fitting. This is
153	* basically what is implemented in FragmentationFitPotentialAction.
154	*
155	* \code
156	* // we need the homology container and the representative graph we want to
157	* // fit to.
158	* HomologyContainer homologies;
159	* const HomologyGraph graph = getSomeGraph(homologies);
160	* Fragment::charges_t h2o;
161	* h2o += 8,1,1;
162	* // TrainingData needs so called Extractors to get the required distances
163	* // from the stored fragment. These functions are bound via boost::bind.
164	* TrainingData AngleData(
165	* boost::bind(&Extractors::gatherDistancesFromFragment,
166	* boost::bind(&Fragment::getPositions, _1),
167	* boost::bind(&Fragment::getCharges, _1),
168	* boost::cref(h2o),
169	* _2)
170	* );
171	* // now we extract the distances and energies and store them
172	* AngleData(homologies.getHomologousGraphs(graph));
173	* // give ParticleTypes of this potential to make it unique
174	* PairPotential_Angle::ParticleTypes_t types =
175	* boost::assign::list_of<PairPotential_Angle::ParticleType_t>
176	* (8)(1)(1)
177	* ;
178	* PairPotential_Angle angle(types);
179	* // give initial parameter
180	* FunctionModel::parameters_t params(PairPotential_Angle::MAXPARAMS, 0.);
181	* // ... set some potential-specific initial parameters in params struct
182	* angle.setParameters(params);
183	*
184	* // use the potential as a FunctionModel along with prepared TrainingData
185	* FunctionModel &model = angle;
186	* FunctionApproximation approximator(AngleData, model);
187	* approximator(FunctionApproximation::ParameterDerivative);
188	*
189	* // obtain resulting parameters and check remaining L_2 and L_max error
190	* angleparams = model.getParameters();
191	* LOG(1, "INFO: L2sum = " << AngleData(model)
192	* << ", LMax = " << AngleData(model) << ".");
193	* \endcode
194	*
195	* The evaluation of the fitted potential is then trivial, e.g.
196	* \code
197	* // constructed someplace
198	* PairPotential_Angle angle(...);
199	*
200	* // evaluate
201	* FunctionModel::arguments_t args;
202	* // .. initialise args to the desired distances
203	* const double value = angle(args)[0]; // output is a vector!
204	* \endcode
205	*
206	* \section potentials-stability-of-fit note in stability of fit
207	*
208	* As we always start from random initial parameters (within a certain sensible
209	* range at least), the non-linear fit does not always converge. Note that the
210	* random values are drawn from the defined distribution and the uniform distributionm
211	* engine is obtained from the currently set, see \ref randomnumbers. Hence, you
212	* can manipulate both in order to get different results or to set the seed such that
213	* some "randomly" drawn value always work well (e.g. for testing).
214	*
215	* In any case, For this case the FragmentationFitPotentialAction has the option
216	* "take-best-of" to allow for multiple fits where the best (in terms of l2 error)
217	* is taken eventually. Furthermore, you can use the "set-threshold" option to
218	* stop restarting the fit procedure first when the L2 error has dropped below the
219	* given threshold.
220	*
221	* \section potentials-howto-add Howto add new potentials
222	*
223	* Adding a new potential requires the following:
224	* -# Add the new modules to Potentials/Specifics
225	* -# Add a unit test on the potential in Potentials/Specifics/unittests
226	* -# Give the potential a type name and add it to PotentialTypes.def. Note
227	* that the name must not contain white space.
228	* -# Add the potential name as case to PotentialFactory such that it knows
229	* how to instantiate your new potential when requested.
230	* -# Remember to use the the RandomNumberGenerator for getting random starting
231	* values!
232	*
233	* PotentialTypes.def contains a boost::preprocessor sequence of all
234	* potential names. PotentialFactory uses this sequence to build its enum to
235	* type map and inverse which the user sees when specifying the potential to
236	* fit via PotentialTypeValidator.
237	*
238	*
239	* \date 2013-04-09
240	*/

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format