Search_Compare



2       Theory


Superimposing Molecules

A common use of molecular modeling is to find the conformational resemblance and the molecular field similarity of a group of molecules to a template molecule. The template molecule may be a lead compound, or a desired structure that is complementary to a receptor molecule.

Two of the most powerful techniques in conformational comparison are provided in Search_Compare. The first method is to superimpose the molecules by minimizing the distances between specified atomic nuclei. The second method is to align the molecules using their electrostatic potentials and steric shapes. These processes are accomplished by rotating and translating the molecules. Very often, you may also want to allow some bonds in these molecules to be rotated during the fitting process. This allows you to answer questions such as:

RMS Fitting

In the Overlap pulldown, the function to be minimized is the sum of squares of the distances between all atoms to be superimposed, as in Eq. 1:

Eq. 1            

where:

M = the number of molecules;

N(p,q) = the number of atoms between the pth and the qth molecules to be aligned;

and

= the transformed atomic coordinate of the ith atom in the pth molecule. This is described as:

Eq. 2            

where:

= the original atomic coordinate;

= the rotation angles of the molecule;

= the translation vector of the molecule;

and

= the angles of all torsions in the molecule to be optimized.

Eq. 1 can be minimized by conventional nonlinear least-squares methods.

The RMS value reported by the Overlap/Define calculation is:

Eq. 3            

where F is calculated using Eq. 1, and

Eq. 4            

Electrostatic Potential Similarity

A number of different techniques have been proposed and applied to electrostatic potential similarity calculation, which is becoming a well-established modeling technique (Good, 1992). Different formulas for similarity determination have been proposed. Here, we use the Hodgkin index as discussed by Good for electrostatic potential similarity calculation

Eq. 5            

In Eq. 5, Pa and Pb are the electrostatic potentials for molecules a and b, which are dependent on the atomic charges and distance according to Eq. 6:

Eq. 6            

where:

n = the number of atoms in the molecule;

r = the coordinate where electrostatic potential is to be evaluated;

Ri = the coordinate position of atom i;

and

Qi = the charge assigned to atom i.

The value of the function, SF, ranges from -1, maximum dissimilarity, to 1, indicating identical potentials. A value of 0 corresponds to two molecules with zero electrostatic potential overlap, either because the molecules are far apart or because the value of the positive overlap equals the value of the negative overlap.

For multiple molecules, Eq. 7 is used for the similarity calculation and optimization:

Eq. 7            

where M = the number of molecules.

Here, the SF function again ranges from -1 (inverse) to 1 (identical).

Steric Shape Similarity

The molecular steric similarity of two molecules is calculated with Eq. 5, where Pa and Pb are the steric functions for molecules a and b, which are Lennard-Jones potentials as in Eq. 8 ("9-6" potential) or Eq. 9 ("12-6" potential), depending on forcefield choice

Eq. 8            

Eq. 9            

where:

n = the number of atoms in the molecule;

r = the coordinate where steric potential is to be evaluated;

k = a constant;

e = epsilon for the atom type;

s = sigma for the atom type.

The function SF ranges from 0, meaning zero steric overlap (molecules are too far apart), to 1 indicating identity.

Field_Fit (Electrostatic & Steric Similarity)

The combined similarity is calculated using Eq. 7.

Eq. 10            

w = user-specified weighting factor, ranging from 0 to 1.

The combined similarity function SF ranges from -1, meaning maximum dissimilarity, to 1, indicating identity.

Steric Clash-checking

To avoid van der Waals clashes in flexible fitting, a penalty function is added to the similarity function or the RMS function during the optimization process, as in Eq. 8.

Eq. 11            

where:

SF = the similarity function or RMS function.

The following penalty function is used:

Eq. 12            

Eq. 13            

where:

n = the number of atom pairs between rotatable segments in the molecule;

ro2 = the sum of the VDW radii;

r2 = the distance between the two atoms.

To provide a degree of "softness" in the penalty function, scaling factors are applied to the VDW radii of atoms. The scaling factors are:

1-4 interactions 0.85
H-bond candidates 0.65
Others 0.95


Systematic Conformational Searching

Systematic conformational searches can provide relatively quick answers to several types of problems. For example, determining which of a series of molecules that appear to be related is able to assume the correct conformation for binding to a particular receptor restricts the number of compounds that must be tested in more expensive or time-consuming ways.

Standard Searches

Systematic conformational searching explores the torsional space sterically available to a molecule. A dihedral angle is stepped through a range of values, and each resulting conformation is accepted if its interatomic distances do not indicate steric clashes between van der Waals radii of the constituent atoms and if optional user-defined distance constraints are not violated.

Steric clashes are not checked for atoms whose interatomic distances do not change during the search. However, if you do not minimize your molecule with respect to the current forcefield the search may return no conformations, since it may detect steric atom clashes in the initial molecule. Consequently, it is generally a good idea to optimize a molecule's geometry with Discover before performing systematic conformational searching.

Note that, in a standard search, only the steric interactions are examined--the energy is not calculated unless you specifically request it.

Use of Distance Constraints

Since the search process varies selected torsions through all combinations of angles (within desired ranges), systematic searches can generate undesirably large numbers of conformations. The number of generated conformations can be greatly reduced, thus increasing computational efficiency, by using pair-wise interatomic distances derived from experimental data or from previous searches on related molecules as constraints. In this way, only those conformations whose interatomic distances lie within desired ranges are generated.

Distance Maps

One way of rejecting molecules having interatomic distances that lie outside the desired ranges is to set up specific interatomic distance constraints, as mentioned under Standard Searches. Another, more efficient way of limiting accepted conformations to those that are similar to several possible conformations, is based on a distance map. This is a map of the pair-wise interatomic distances being monitored, in N-dimensional space where N = the number of atom pairs. Each interatomic distance is divided into equal increments (referred to as the resolution), thus forming a set of lattice points in the distance space. Each conformation is then mapped to a specific lattice point, taking into account all the monitored interatomic distances in that conformation. More than one valid conformation can map to a single lattice point. Naturally, a map with smaller resolution (more increments) has more points than one with a larger resolution, but then each point represents fewer conformations.

Figure 1.

Figure 1 . Construction of a Two-Dimensional Distance Map

The solid circles represent the map points of conformations A-D, and the open circles represent the map points of many molecules (not shown) whose conformations are relatively close to conformation C. For these molecules, differences in the distances between atoms 1 and 2 are not resolved, while the distances between atoms 3 and 4 are resolved into four groups. Note that the resolutions for the two axes (i.e., for the two interatomic distances being monitored) do not have to be identical.  

shows an example of a two-dimensional distance map, where the distances between two pairs of atoms are mapped for a series of conformers. If, in addition, we wanted to monitor or constrain the distance between a third pair of atoms in each conformer (say, atoms 1 and 4), the associated distance map could be illustrated as a three-dimensional block, where the z-axis would be the distance between atoms 1 and 4. There is no limit to the number of distances we can monitor in this fashion.

Distance Maps used in Series

When conformational searches are being performed on a series of molecules with the idea of finding a geometry common to all of them, computational efficiency can be increased further by using the results obtained for each molecule in the series to restrict searches on subsequent molecules. In effect, distance maps are used as active filters to restrict the searches to regions of conformational space that contain sterically allowed conformations that are common to the set of compounds. This type of analysis might actually lead to a null set, indicating the absence of a common geometry. If so, the hypothesis that the functional groups (or biophores) correspond may be incorrect. Alternatively, a search may lead to a set of several possible geometries. In this case, their relative likelihoods must be tested and analyzed further by other means.

Screening by Energy Criteria

The number of conformers that satisfy van der Waals clash-checking and distance constraints can still be very large. Many of these conformers may be energetically unfavorable. By evaluating the total energy of each conformer and screening out those conformers whose energy values are a specified amount above that of the minimum-energy conformation, a much smaller set of stable conformers can be obtained for further examination.

Very often, instead of simply calculating the conformers' energies, it may be preferable to perform energy minimization for each conformer before screening out unstable conformers. The resulting conformers are the most stable, minimized conformations of the molecule under the specified search conditions. Performing a systematic conformational search using this option enables multiple stable conformers to be found. This has a great advantage over optimization of a single structure. In the latter case, only one conformer is obtained, which may correspond to a local rather than a global minimum in the available conformational space.

Discover functionality is used to calculate and minimize the energy. Several minimizers are used in sequence: steepest descents, followed by conjugate gradients, followed by a quasi-Newton-Raphson method known as BFGS (formerly VA09A). Please refer to the Implementation section and to the Discover documentation for additional details.


Vector Maps

Even with the application of various screening methods during a search, large amounts of data may still remain to be analyzed. Vector maps are a useful analytical tool for examining these results. They provide a means of graphically displaying regions of space swept out by a pair of atoms during a systematic conformational search.

A vector map consists of a set of line segments representing the coordinates of a pair of atoms for every conformation within a trajectory resulting from a systematic search. The connected atom pair is the same for each frame or conformer of the search, but since the atoms' locations change from one frame to the next, a series of lines (vectors) is generated that shows the position of the atom pair for each trajectory frame or search conformation.

A vector map refers to both the atom pair and the Insight II trajectory that has been loaded with the Trajectory/Get command in the Analysis module or the SC_Search/Load command in the Search_Compare module.

Vector maps are a useful tool for finding which conformations of a search or trajectory have atoms located at specific points in space. There is no requirement that the atom pair used to define the vector map be a bonded atom pair. Defining a vector map for a nonbonded pair of atoms is a useful means of visualizing relative orientations and distances between a pair of atoms for each trajectory frame.


Screening Duplicate Conformations

Very often a systematic torsional search will produce duplicate conformations. This is especially true when the molecule you are searching has local symmetry and/or you are running with energy minimization. In the former case this is because rotating a symmetrical side-group (e.g., a phenyl group by 180 degrees) produces an identical looking conformation--the difference only being in the labeling of the topologically equivalent atoms. In the latter case, several torsional conformations may minimize to the same conformer.

Systematic search offers controls for governing how duplicates are removed. These are essentially based on the conformational energy difference (when energy searches are performed) and/or the RMS difference between the atomic coordinates of superimposed conformations. By default, the parameters for duplicate removal are set with low tolerances so only duplicate conformations are removed. However, by setting higher tolerances more conformations will be screened out as duplicates, resulting in a more diverse set of conformations from a search.

Duplicate removal based on RMS can be customized for use of topological symmetry and for the subset of atoms it is applied to. By specifying the set of atoms used for RMS calculations, duplicate removal can be made to focus on just the regions of interest within the molecule's conformation. For example, you may not care about the geometry of a particular side chain of your molecule and so atoms in this region could be excluded from the list of RMS atoms.




Last updated November 22, 2003.
Copyright © 2003, Accelrys Inc. All rights reserved.