Homology


Contents

Release 2005, March 2005


1. Introduction

What is Homology?
Hardware and Installation
How to Invoke Homology
Program Environment
Homology and Insight II
Saving Homology Information with Insight II
Command Logging and Restarting
Homology and Discover
Operations

2. Theory

Background
Homology Model Building
Searching Sequence Databases With the FASTA Program
ktup Value
Scoring Local Regions
Joining Regions
Optimizing the Sequence Matching
Explicit Statistical Estimation
Determining Structurally Conserved Regions (SCRs)
Manual Determination of Structurally Conserved Regions
Automatic Determination of Structurally Conserved Regions
Automatic Sequence Alignment Methods
Needleman and Wunsch Algorithm for Pairwise Alignment
Accelrys' Pairwise Alignment Procedure
Accelrys' Automatic Multiple Sequence Alignment
align123
Scoring Matrices
Superimposing structures based on sequence alignment
Multiple Structure Alignment
Simultaneous Superposition of Structures
Assignment of Coordinates Within a Conserved Region
Assignment of Coordinates for Loop or Variable Regions
Search Loops Command
Generate Loops Command
Side Chain Conformational Searches Using Rotamer Libraries
Refinement of the Model Using Molecular Mechanics
The Potential Energy Equation
Energy Minimization
Energy Constraints
Molecular Dynamics
Secondary Structure Prediction
The Chou-Fasman Method
The GOR II Method
Hydrophobicity Profiles
Solvent Accessible Surfaces
Definition of Solvent Accessible Surface Area
Significance of Solvent Accessible Surface Area
Solvation Module versus ProStat
Solvent Accessible Surface Area for Protein Structure Validation
Rules for Protein Validation

3. Implementation

Sequence Window
Sequence Display
Controls
Sequence Boxes
Sequence Gaps
Manipulating the Sequence Display
Scrolling Modes
Seq Mode
Box Mode

4. Command summary

Modules
Pulldowns
Commands
Sequences pulldown
Boxes pulldown
Loops pulldown
Residue pulldown
Databases pulldown
Background_Job pulldown
Alignment pulldown
By_Residue pulldown
Refine pulldown
Profiles_3D pulldown
ProStat pulldown
Modeler pulldown
Seqfold pulldown

5. Methodology

Step 1: Determine Which Proteins Are Related to the Model Protein
Generating a Sequence Database
Sequence Database Searching
Motif searching
Step 2: Determining Structurally Conserved Regions (SCRs)
Automatic Determination of SCRs
Specifying the Initial Search Zone
Optimizing the Automatic Search for SCRs
Subsets
Automatic Superimposing of Structures
Characteristics of m-boxes
Handling of Existing Boxes
Interrupting the Search
Manual Determination of SCRs
Finding Pairwise SCRs
Criteria for Evaluating Manually-Determined SCRs
Summarizing the Manually-Determined SCRs
Superimposing Reference Proteins Using Manually-Determined SCRs
Superimposing structures
Multiple Sequence Alignment as an Alternative to the Manual Method
Step 3: Sequence Alignment
Choosing a Scoring Matrix
Automatic Sequence Alignment without SCRs
Automatic Sequence Alignment with Automatically-Determined SCRs
Automatic Sequence Alignment with Manually-Determined SCRs
Pairwise Manual Sequence Alignment
Multiple Sequence Alignment
Specifying the Initial Search Zone
Specifying a Mandatory Sequence
Automatic Calculation of Pairwise Threshold
Statistical Significance and Alternate Sequence Coloring
Characteristics of m-boxes
Subsets
Automatic Superimposing of Structures
Adjusting the Sensitivity and Selectivity of the Search
Handling of Existing Boxes
Interrupting the Search
Excessive Calculation Time
Single_Search Mode
Manual Mode
Step 4: Assigning Coordinates Within the SCRs
Step 5: Building Loop or Variable Regions (VRs)
Searching for and Displaying Loops
Generating and Displaying Loops
Building Coordinates for the VRs
Step 6: Conformational Search for Side Chains Using Rotamers
Step 7: Refining the Structure with Discover or CHARMm
Running Discover or CHARMm with Homology-Built Model Structures
End Repair
Splice Repair
Energy Minimization
Molecular Dynamics
Step 8: Validating Results
Structure Checking
Residue Dihedral Angles
Secondary Structure Classification
Algorithmic Implementation
United Atom Models
Setting Atomic Radii
Definition of Computed Surface Areas and their Significance
Total Surface Area
Relative Surface Area and the Tripeptide Model
Polar and Apolar Surface Area
Limitations in Implementation
Conclusion

6. Tutorial

Introduction
Hardcopy and Pilot online tutorials
Hardcopy lessons
Lesson 4a: Finding structurally conserved regions
Lesson 4b: Building SCRs and loops
Lesson 10: Finding alternative multiple sequence alignments

A. References

B. File Formats

Introduction
Amino Acid Scoring Matrices
User Scoring Matrix Files
Sequence Alignment Command
Input Databases Command
Get Sequence, Alignment and Databases commands
Secondary structure file

C. Glossary

D. seq_extract Utility

Output
Results Displayed on the Screen
Output Files
Execution Options

E. Matrices

Sequence Alignment Matrices
Identity Matrix
Codon Substitution Matrix
Dayhoff Evolutionary Mutation Matrix
Hydrophobicity Matrix
Input Databases Command Matrices
Identity Matrix
Codon Substitution Matrix
Dayhoff Evolutionary Mutation Matrix
Hydrophobicity Matrix

F. Hydrophobicity Scale Values

Amino Acid Values
Threshold Values

G. Sequence Databases

Protein Sequence Databases
NBRF
Swiss-Prot
DNA Sequence Databases
GenBank
EMBL Data Library

H. align123 Standalone

HELP 1: General help for align123
HELP 2: Help for multiple alignments
HELP 3: Help for pairwise alignment parameters
HELP 4: Help for multiple alignment parameters
HELP A: Help for protein gap parameters.
HELP 5: Help for output format options.
HELP 6: Help for profile and structure alignments
HELP B: Help for secondary structure / gap penalty masks
HELP C: Help for secondary structure / gap penalty mask output options
HELP 7: Help for phylogenetic trees
HELP 8: Help for choosing a weight matrix
HELP 9: Help for command line parameters DATA (sequences)
HELP 10: Help for tree output format options


Last updated November 22, 2003.
Copyright © 2003, Accelrys Inc. All rights reserved.