Amber Workshop - Tutorial 2

(Note: These tutorials are meant to provide illustrative examples of how to use the AMBER software suite to carry out simulations that can be run on a simple workstation in a reasonable period of time. They do not necessarily provide the optimal choice of parameters or methods for the particular application area.)
Copyright Ross Walker 2004

2) Setting up duplex DNA: polyA-polyT

The first stage of this tutorial is to create the necessary input files required by sander for performing minimisation and molecular dynamics.

There are a number of methods for creating these input files. In this example we will use a program written specifically for this purpose: LEaP. This is a program that reads in force field, topology and coordinate information and produces the files necessary for production calculations (i.e. minimisation, molecular dynamics, analysis, ...). There are two versions of this program provided with AMBER 8. A graphical version called xleap and a terminal interface called tleap. Since we want to "see" graphical representations of our models we will use xleap in this tutorial.

The approximate ordering of this section is as follows:

Where do I get the coordinates?
What representation should I use and what should I simulate?
A discussion of issues to consider before starting...
Building the prmtop and inpcrd files... This will be done using xleap.

2.1) Generating the coordinates of the model structure

The first step in any modelling project is developing the initial model structure. Although in principle, one could use xleap to build a model structure by hand this is only practical for the smallest of systems. The difficulty in both manipulating and predicting the structure of large biomolecular systems means that building a structure by hand is not usually a sensible undertaking. Instead, experimentally determined structures are used. These can be found by searching through databases of crystal or NMR structures such as the Protein Data Bank or the Cambridge Structural Database. With nucleic acids, users can also search the Nucleic Acid Database.

When experimental structures are not available all hope is not lost since there are a variety of programs that facilitate building model structures using homology modelling and predictive techniques; the list of possible sources is beyond the scope of this tutorial. However, it is worth mentioning that for nucleic acid structure prediction Dave Case and Tom Macke of The Scripps Research Institute have developed the NAB molecular manipulation language which facilitates the building of complex nucleic acid structures (see http://www.scripps.edu/case/ for further info). Numerous methods also exist for predicting the structure of proteins but in general such structure prediction is still in its infancy. Thus a good experimental structure is typically preferred. If, however, a predicted protein starting structure is all that is available it should be noted that these typically require more elaborate minimisation and equilibration procedures prior to production dynamics simulations than do structures found by experimental methods.

Hurdle #1: Note: as pointed out in the later tutorials, sometimes dealing with Brookhaven PDB files (as provided by the Protein Data Bank) can be rather tricky due to variations in naming conventions. The naming, and often formatting issues, that "break" the programs are the first hurdle that must be overcome in order to perform detailed molecular simulation. Generally within AMBER, if you are using standard amino acid or nucleic acid residues, at most all that is necessary is some slight massaging of the PDB format and atom names. If, however, you wish to simulate complex carbohydrates, lipids or non-standard protein residues then you may be required to develop parameters and topology information. This, however, is a more advanced issue that will be deferred until tutorials 4 and 5.

2.1.1) Creating our DNA duplex using nucgen

Since we don't have an experimentally determined structure for our 10-mer DNA duplex, we are going to have to create one from scratch. Fortunately AMBER comes with a simple program for generating canonical A- and B- duplex geometries of nucleic acids. The program is called nucgen and it allows us to generate Cartesian coordinate A- and B-models of standard nucleic acid duplexes. The pdb file that is produced by nucgen will have all of the atoms named according to the AMBER conventions. All of the hydrogen atoms will be missing, however, so these will need to be added. xleap will conveniently do this for us based on its expected notion of what each residue is (i.e. if you load up a residue called "ADE" into xleap, it will attempt to add all atoms that it finds missing at standard positions; any extra atoms that it finds will be built into the model. Connectivity etc. will not necessarily be defined for these extra atoms, however, since xleap, does not know how they are connected. This will become clearer as we proceed...).

Pictured above are models of the decamer poly(A)-poly(T) shown end-on (top) and with a view into the minor (top half of the molecule) and major (bottom half of the molecule) groove (bottom). On the left is canonical A-DNA, the middle is the average structure previously discussed, and on the right is canonical B-DNA. The method for creating the canonical structures is discussed below.

A note on force fields
A number of different force fields are supplied with AMBER. In AMBER v5.0 and v6.0 the default field was the Cornell et al. (1995) or parm94.dat force field (referred to as FF94 in AMBER v8.0). In AMBER v7.0 and v8.0 there is no default force field. Instead the user must select which force field to use when preparing their prmtop and inpcrd files. The AMBER v8.0 force fields recommended for the simulation of proteins and nucleic acids in explicit solvent are either the FF99 or FF03 force field which contain several improvements over the FF94 force field. With AMBER 9 one should probably consider the FF99SB or FF03 force fields. The most notable changes are new partial charges for proteins based on DFT quantum calculations in continuum solvent (FF03), as well as updated torsion terms for Phi-Psi angles (FF99SB) which improve the over estimation of alpha helices that occurs when using the FF94 and FF96 force fields. It should be noted, however, that the FF99SB/FF03 force fields do not introduce any new changes for nucleic acids. The charges are still based on HF gas phase ab initio quantum calculations and the bond angle and dihedral parameters are the same as the FF99 force field. Thus in this tutorial we will be using the FF99 all-atom force field. More details on the available force fields can be found in the AMBER manual and the papers referenced within.

With the FF99 force field, phosphates are considered to be part of the nucleotide (as is standard). Because the terminal groups of DNA are different (i.e. 3'- and 5'- terminal residues), the names have to be different to associate the appropriate topology (i.e. list of bonds, atoms, etc.) and parameters. For DNA, the residue names used are A5, A and A3 for a 5' terminal adenine, a non terminal adenine, or a 3' terminal adenine respectively. [A single isolated adenine nucleotide is AN.]

So, with this in mind we can construct the input file required by nucgen to build our 10-mer polyA-polyT DNA duplex in the Arnott B-DNA canonical structure. This input file is reproduced below. Basically, this is building two strands of A-T paired DNA. For more specific information about the various options, see the manual.

NUC   1
D
A5   A    A    A    A    A    A    A    A    A3

NUC   2
D
T5   T    T    T    T    T    T    T    T    T3

END
$ABDNA

This file will be used as input for nucgen. You can save yourself the hassle of typing it in and download it here (nuc.in).

NOTE: Before we run any of the programs provided with AMBER, we need to make sure the Unix shell environment variable that specifies where AMBER is installed is set properly. This is necessary for xleap and lets the system know where the executables are located ($AMBERHOME/exe).

If you are running bash, csh or tcsh check the AMBERHOME environment variable is set by typing:

echo $AMBERHOME

If you see:

AMBERHOME: Undefined variable. (csh or tcsh)

or simply a blank line (bash) then the AMBERHOME environment variable was not set properly and you need to initialise it to point to the AMBER installation directory. If AMBER is installed in /usr/local/amber8 then you would type:

setenv AMBERHOME /usr/local/amber8 (csh or tcsh)

or

export AMBERHOME=/usr/local/amber8 (bash)

While you are at it, you may want to update your .cshrc (csh), or .bash_login (bash) file to define this variable and add the AMBER binary directory to your path (or list of directories searched for executables) at login. In your .bash_login file [or the global /etc/bashrc file] (if using the bash shell), add the following (substituting the correct path to AMBER):

export AMBERHOME=/usr/local/amber8
export PATH=$PATH:$AMBERHOME/exe

For further details on installation please refer to the AMBER manual.

Running nucgen
To run nucgen, you need to specify the input file, an output file, the nucgen database and the name of the pdb file you wish to create. So, to create our DNA structure type the following (assuming you have either created or downloaded the nuc.in file in your current directory):

$AMBERHOME/exe/nucgen -O -i nuc.in -o nuc.out -d $AMBERHOME/dat/leap/parm/nucgen.dat -p nuc.pdb

This should produce the following two files: nuc.out, nuc.pdb

The nuc.out file contains a log of the programs progress in creating the model of our 10-mer DNA duplex. Have a look at it if you are interested. In this example the output file is not very informative. It should simply contain a description of our input file. The more interesting file is nuc.pdb. This is the model structure of our DNA duplex. It contains the cartesian coordinates of all of the atoms in the duplex in locations determined from the nucgen.dat parameters read by nucgen. The file should consist of 438 lines with a single line for each atom:

LINE DEFINITION	ATOM NO.	ATOM NAME	RESIDUE NAME	RESIDUE NO.	X	Y	Y
ATOM	1	HST	A5	1	-0.808	-8.873	-2.080
ATOM	2	O3'	A5	1	4.189	-7.682	-0.250
...	...	...	...	...	...	...	...

2.1.2) Loading the structure into Leap

The next step is to take a look at the model structure. It is always a good idea to look at the models before trying to use them. In this way problems can often be identified before running expensive calculations. xleap works fine for displaying models assuming that the appropriate residue definition files are loaded into xleap and the residue names in the pdb file are consistent with what xleap expects. Alternatively a range of freely available and commercial packages exist for viewing pdb files. A very good program, although not the only choice, is VMD (http://www.ks.uiuc.edu/Research/vmd/) which is freely available for academic research. For the moment we will stick to using xleap since this is what we will be using to create the input files for our simulations. Other methods of visualisation will be covered in later sections of this tutorial.

In addition to the residue names xleap also requires information about the chain connectivity. Residues can be connected to other residues from both sides, only one side or not at all. In xleap lingo, the residue can have a "head" or a "tail" (or both).

head and tail: the residue has connectivity from both sides such as an internal residue in a protein or nucleic acid. Examples are alanine "ALA" and thymine "T".
tail only: the residue is a terminal residue at the start of a chain. Examples are the N-terminal residues of proteins "NALA" or the 5'-terminal residue of nucleic acids "T5".
head only: the residue is a terminal residue at the end of a chain. Examples are the C-terminal residues of proteins "CALA" or the 3'-terminal residue of nucleic acids "T3".
no head or tail: the residue doesn't connect to other residues. Examples are an isolated nucleotide "TN" or water "WAT" or an ion "Na+".

The various different residues and their names are defined in library files that xleap loads on startup. More on this later. For the moment it is sufficient to say that the names used for the residues in the pdb files must match those defined in the default xleap library files or in user defined library files.

xleap is fairly simple in that it expects that all residues in the pdb file are connected in the order in which they are listed unless they are separated by a "TER" card (i.e. a single line that says "TER") and that the first residue read in is "tail only" and the last residue is "head only". The "TER" separator ends the chain and begins a new one. It is important to remember this when initially creating the input files for your simulation. As we will see in a minute it is very important that you edit your pdb file and check that there are "TER" cards in the correct locations and that the correct residue naming procedure has been used before you load the file into xleap.

Let's see what implications this has for our DNA example. The first step is to start up the graphical version of LEaP...

$AMBERHOME/exe/xleap -s -f $AMBERHOME/dat/leap/cmd/leaprc.ff99

This should load xleap and you should see something like this appear:

A quick note on the command line used to start xleap: The command line shown above contains a couple of options which are worthy of comment at this point. When xleap loads it initially has to open a series of library and parameter files that define the force field parameters to be used and the residue maps etc. Since AMBER 8.0 ships with a range of different force fields, each suited to different types of simulation, it is important to tell xleap which force field we wish to use. This is what the command line options used above are for. The "-s" switch tells xleap to ignore any user defined defaults which might otherwise override our selection, while the "-f $AMBERHOME/dat/leap/cmd/leaprc.ff99" switch tells xleap to execute the start-up script for the FF99 force field. This script contains commands that cause xleap to load all of the configuration files required for the AMBER FF99 force field. If you look in the $AMBERHOME/dat/leap/cmd/ directory you will find a number of different leaprc files such as leaprc.ff03 (FF03 force field), leaprc.ff02ep (FF02 polarisable force field with lone pairs) etc.

XLEAP MENU BUG: Note if you find that the menu's in xleap are not working please check that your numlock light is turned off. For some reason having numlock on prevents the xleap menus from operating correctly.

To load a pdb into xleap we can use the loadpdb command. This will create a new unit in xleap and load the specified pdb into that unit. We can then subsequently view and edit the new unit using the edit command.

So, to load the pdb file we just created into a new unit called "dna1" type the following in the xleap window (make sure the xleap window is highlighted and your mouse cursor is within the window):

dna1=loadpdb "nuc.pdb"

This should result in output, similar to the following, appearing in the xleap window:

> dna1=loadpdb "nuc.pdb"
Loading PDB file: ./nuc.pdb
Unknown residue: A3 number: 9 type: Nonterminal
Unknown residue: T5 number: 10 type: Nonterminal
Creating new UNIT for residue: A3 sequence: 10
One sided connection. Residue: missing connect0 atom.
Created a new atom named: P within residue: .R<A3 10>
Created a new atom named: O1P within residue: .R<A3 10>
Created a new atom named: O2P within residue: .R<A3 10>
Created a new atom named: O3' within residue: .R<A3 10>
Created a new atom named: O5' within residue: .R<A3 10>
Created a new atom named: C1' within residue: .R<A3 10>
Created a new atom named: C2' within residue: .R<A3 10>
Created a new atom named: C3' within residue: .R<A3 10>
Created a new atom named: C4' within residue: .R<A3 10>
Created a new atom named: C5' within residue: .R<A3 10>
Created a new atom named: O4' within residue: .R<A3 10>
Created a new atom named: N6 within residue: .R<A3 10>
Created a new atom named: H61 within residue: .R<A3 10>
Created a new atom named: H62 within residue: .R<A3 10>
Created a new atom named: C6 within residue: .R<A3 10>
Created a new atom named: C5 within residue: .R<A3 10>
Created a new atom named: N7 within residue: .R<A3 10>
Created a new atom named: C8 within residue: .R<A3 10>
Created a new atom named: N9 within residue: .R<A3 10>
Created a new atom named: C4 within residue: .R<A3 10>
Created a new atom named: N3 within residue: .R<A3 10>
Created a new atom named: C2 within residue: .R<A3 10>
Created a new atom named: N1 within residue: .R<A3 10>
Created a new atom named: H3T within residue: .R<A3 10>
Creating new UNIT for residue: T5 sequence: 11
Created a new atom named: H5T within residue: .R<T5 11>
Created a new atom named: O3' within residue: .R<T5 11>
Created a new atom named: O5' within residue: .R<T5 11>
Created a new atom named: C1' within residue: .R<T5 11>
Created a new atom named: C2' within residue: .R<T5 11>
Created a new atom named: C3' within residue: .R<T5 11>
Created a new atom named: C4' within residue: .R<T5 11>
Created a new atom named: C5' within residue: .R<T5 11>
Created a new atom named: O4' within residue: .R<T5 11>
Created a new atom named: O4 within residue: .R<T5 11>
Created a new atom named: C4 within residue: .R<T5 11>
Created a new atom named: N3 within residue: .R<T5 11>
Created a new atom named: H3 within residue: .R<T5 11>
Created a new atom named: C2 within residue: .R<T5 11>
Created a new atom named: O2 within residue: .R<T5 11>
Created a new atom named: N1 within residue: .R<T5 11>
Created a new atom named: C6 within residue: .R<T5 11>
Created a new atom named: C5 within residue: .R<T5 11>
Created a new atom named: C7 within residue: .R<T5 11>
One sided connection. Residue: missing connect1 atom.
total atoms in file: 438
Leap added 180 missing atoms according to residue templates:
180 H / lone pairs
The file contained 43 atoms not in residue templates
>

What happened?
All of the residues were successfully added to our unit "dna1" as DNA residues except for residues 10 (A3) and 11 (T5) which were not recognised. These correspond to the final residue of the first strand and the first residue of the second strand. The problem stems from the fact that xleap didn't know that these were terminal residues since there was no "TER" card separating the strands. Thus when it loaded the pdb it expected a residue with both a "head" and a "tail". However, residue "A3" (res 10) does not have a tail ("connect 0") and therefore the residue was not identified properly. When a residue is not recognised the atoms specified in the pdb for that residue are loaded by leap as a single residue but none of the atoms are bonded. The creation of a new "unknown" residue can be seen from part of the output above:

Creating new UNIT for residue: A3 sequence: 10
One sided connection. Residue: missing connect0 atom.

A similar situation arose for residue 11 (T5) which is missing its head "connect1". Despite these issues, you can still "view" the pdb file, but for proper behaviour it is necessary to add the "TER" card before loading the pdb file.

In general, comments about missing connect0 or connect1 atoms refers to the chain connectivity and probably means that either you did not place the "TER" card or that the terminal residues were not set properly. Otherwise if you see comments about missing atom "X" and / or adding atom "Y" etc. then the most likely cause is misnaming of residues in the pdb file such that they do not conform to what xleap expects. More on this later...

To see the structure, in xleap, (with the wrongly defined terminal residues) use the edit command and specify the unit you wish to edit, in this case the one we just created "dna1":

edit dna1

This should open the xleap editor window and display something similar to the following (the initial molecular orientation may differ from that shown below):

Within this window the LEFT mouse button allows atom selection (drag and drop), the MIDDLE mouse button rotates the molecule and the RIGHT mouse button translates it. If the MIDDLE button does not work you can simulate it by holding down the Ctrl key and the LEFT mouse button. To make the image larger or smaller, simultaneously hold down the MIDDLE and RIGHT buttons (or Ctrl and RIGHT) and move the mouse up to zoom in or down to zoom out.

If you played with selecting atoms using the left mouse button you can unselect a region by holding down the shift key while drawing the selection rectangle. To select everything, double click the LEFT button (and to unselect, do the same while holding down the shift key).

Take a look at the structure. You should be able to see the perfect symmetry of canonical B-form geometry DNA. The perfect symmetry of canonical duplexes is based on analysis of long fibres of DNA. Real nucleic acids don't necessarily adopt this perfect symmetry as will become apparent once we start to carry out molecular dynamics on this 10-mer.

2.2) What level of simulation am I going to attempt?

Once you have got a suitable model structure the next step is to decide what level of simulation realism is to be used. The complexity of the calculation centres on the evaluation of the pairwise nonbonded and Coulombic interactions. Extra complexity is introduced by using periodic boundaries and Ewald methods to treat long ranged electrostatics and/or by evaluating non-additive effects such as induced polarisation.

Water is an integral part of nucleic acid structure and thus some representation of solvent effects is fairly critical. Simulations in vacuo have been performed where the screening of solvent is modelled by distance dependent or sigmoidal dielectric functions (the latter of which is not implemented in AMBER 8.0). Additionally tricks have been applied to keep the base pairs from fraying, through the addition of Watson-Crick base pair restraints and the reduction of the charges on the phosphate groups. Newer versions of AMBER (6.0 and above) contain the generalised Born model for implicit solvation which, although more expensive, provides a much better implicit solvent representation than simply using a distance dependent dielectric constant.

However, even with advances in computer power and methodological improvements, such as application of Ewald methods, which allow routine simulations of nucleic acids with explicit solvent and counterions in the nanosecond time range, there is still dependence of the results on the molecular mechanical force field. It is therefore important to understand the inadequacies of the force field being used.

Now, back to the point of this discussion, if you can afford it, include the solvent. Definitely include the explicit net-neutralising counterions. Also pay attention to the force field applied and be aware of its limitations. Use methods which properly treat long range electrostatic interactions, such as Ewald methods, if you can. However, remember that adding explicit water is expensive. While a nanosecond or so of in vacuo DNA simulation can take only minutes on a 3GHz P4, adding a periodic box of water that surrounds the DNA by roughly 10 angstroms, extends the simulation to several days. Given that it is normally necessary to run the simulation a couple of times (due to errors, sampling issues, etc.), these simulations can get very costly.

2.2.1) The types of simulations to be run in this tutorial

For the purpose of this tutorial we will build 3 different DNA models. The first will be an in vacuo model of the poly(A)-poly(T) structure (named polyAT_vac), an in vacuo model of the poly(A)-poly(T) structure with explicit counterions (named polyAT_cio), and a TIP3P (water) solvated model of the poly(A)-poly(T) structure in a periodic box (named polyAT_wat). The in vacuo model will be applied in simulations to get a feel for MD and then the solvated model will be used for periodic boundary simulations using a particle mesh Ewald treatment.

In order simplify post simulation analysis of the trajectories it is useful to have all three sets of prmtop files. This is because often in the analysis of the trajectory displaying the solvent is not normally necessary and the visualisation packages will run much faster if the solvent is removed from the trajectory file before loading. Obviously the water is necessary for calculating radial distribution functions, analysing water structure, and other properties, however it isn't necessary for calculating helicoidal parameters, determining average structures, etc. Therefore, to minimise disk space usage, and speed the analysis, we often strip the water and/or counterions. The three separate prmtop files are useful to have around since you need to use a prmtop that matches the structure of your (possibly stripped) trajectory for programs such as ptraj, rdparm, VMD etc.

2.3) Building the prmtop and inpcrd files

Now that we have a starting pdb (nuc.pdb) and an understanding of some of the issues surrounding different types of classical MD simulations we are ready to start building the input files necessary for the MD engine in AMBER 8.0, sander.

The first step is the building of residues. Many proteins contain coenzymes as well as standard amino acids. These coenzymes are not normally pre-defined in the AMBER database and so are considered to be non-standard residues. It is necessary to provide structural information and force field parameters for all of the non-standard residues that will be present in your simulation before you can create the sander input files. Fortunately, if you are using standard nucleic acid or amino acid residues, as we will be in this tutorial, this step is not necessary since all of the residues are pre-built in the AMBER database. Tutorials 4 and 5 will cover what to do if you have non-standard residues.

2.3.1) LEaP

If xleap isn't already running, start it up...

$AMBERHOME/exe/xleap -s -f $AMBERHOME/dat/leap/cmd/leaprc.ff99

In so doing, you should have seen that the appropriate libraries were loaded, as specified by the -f flag.

To see if you have the residues we need for this example, you can check to see if the residues have been properly loaded by trying to edit the unit DA5, i.e. in xleap:

edit DA5

If you see something similar to the following, i.e. the unit is not blank, then the residues were loaded correctly.

Close the window by selecting Unit -> Close from the top menu. (If the menu's don't work turn off NumLock). You can also list all the defined residues in LEaP by typing "list". You should see the following:

> list
ACE    ALA    ARG    ASH    ASN    ASP    CALA    CARG
CASN   CASP   CCYS   CCYX   CGLN   CGLU   CGLY    CHCL3BOX
CHID   CHIE   CHIP   CHIS   CILE   CIO    CLEU    CLYS
CMET   CPHE   CPRO   CSER   CTHR   CTRP   CTYR    CVAL
CYM    CYS    CYX    Cl-    Cs+    DA     DA3     DA5
DAN    DC     DC3    DC4    DC5    DCN    DG      DG3
DG5    DGN    DT     DT3    DT5    DTN    GLH     GLN
GLU    GLY    HID    HIE    HIP    HIS    HOH     IB
ILE    K+     LEU    LYN    LYS    Li+    MEOHBOX MET
MG2    NALA   NARG   NASN   NASP   NCYS   NCYX    NGLN
NGLU   NGLY   NHE    NHID   NHIE   NHIP   NHIS    NILE
NLEU   NLYS   NMABOX NME    NMET   NPHE   NPRO    NSER
NTHR   NTRP   NTYR   NVAL   Na+    PHE    PL3     POL3BOX
PRO    RA     RA3    RA5    RAN    RC     RC3     RC5
RCN    RG     RG3    RG5    RGN    RU     RU3     RU5
RUN    Rb+    SER    SPC    SPCBOX THR    TIP3PBOXTIP4PBOX
TP3    TP4    TP5    TRP    TYR    VAL    WAT     parm99
>

What residues are we expecting? Here are the different names of the residues for the O3' atoms from the pdb we created above...

ATOM    2 O3' A5     1       4.189 -7.682 -0.250
ATOM   25 O3' A      2       7.904 -3.753   3.130
ATOM 209 O3' A3    10      -1.127 -8.677 30.170
ATOM 231 O3' T5    11       7.904   3.753 30.670
ATOM 252 O3' T     12       8.601 -1.610 27.290
ATOM 420 O3' T3    20       4.189   7.682   0.250

However, you may notice that xleap doesn't recognise "A". If you simply type "edit A" you will get a new unit called A. But "A" is what we have in the PDB file. Moreover, xleap did not complain previously when we loaded up the nuc.pdb file about problems with residues called "A" (only problems with residues 10 and 11). So, how does xleap recognise the names (RNA vs. DNA, terminal vs. non-terminal)?

This has to do with a residue map that is defined in the leaprc.ff99 file we selected on the command line when starting xleap. Taking a look inside this file we find a section that defines a nucleic acid residue map:

#
# Define the PDB name map for the amino acids and DNA.
#
addPdbResMap {
{ 0 "ALA" "NALA" } { 1 "ALA" "CALA" }
.
.
.
{ 0 "GUA" "DG5" } { 1 "GUA" "DG3" } { "GUA" "DG" }
{ 0 "ADE" "DA5" } { 1 "ADE" "DA3" } { "ADE" "DA" }
{ 0 "CYT" "DC5" } { 1 "CYT" "DC3" } { "CYT" "DC" }
{ 0 "THY" "DT5" } { 1 "THY" "DT3" } { "THY" "DT" }
{ 0 "G" "DG5" } { 1 "G" "DG3" } { "G" "DG" } { "GN" "DGN" }
{ 0 "A" "DA5" } { 1 "A" "DA3" } { "A" "DA" } { "AN" "DAN" }
{ 0 "C" "DC5" } { 1 "C" "DC3" } { "C" "DC" } { "CN" "DCN" }
{ 0 "T" "DT5" } { 1 "T" "DT3" } { "T" "DT" } { "TN" "DTN" }
{ 0 "C5" "DC5" }
{ 0 "G5" "DG5" }
{ 0 "A5" "DA5" }
{ 0 "T5" "DT5" }
{ 1 "C3" "DC3" }
{ 1 "G3" "DG3" }
{ 1 "A3" "DA3" }
{ 1 "T3" "DT3" }

}

This section maps the names of pdb residues to xleap residues. By default everything is assumed to be DNA, i.e. "A" maps to "DA" etc. In addition to the above mapping, we see that there is a mapping of pdb style naming conventions to AMBER, such as O4* to O4', etc. This means when using xleap and molecules containing non-standard residues, such as DNA:RNA hybrids, you cannot simply expect to use the loadpdb command and get reasonable behaviour. In this case you may move to the more advanced command, loadpdbusingseq, which allows overriding the pdb residue names and gives you much more control.

Note: for a list of available xleap commands type "help" in the main xleap window. For help on a specific command type "help command". E.g. for help with loadpdb you should get the following on typing "help loadpdb":

> help loadpdb

variable = loadPdb filename
STRING _filename_

Load a Protein Databank format file with the file name _filename_. The sequence numbers of the RESIDUEs will be determined from the order of residues within the PDB file ATOM records. For each residue in the PDB file, LEaP searches the variables currently defined for variable names that match the residue name. If a match is found, then the contents of the variable are copied into the UNIT created for the PDB structure. If no PDB `TER' card separates the current residue from the previous one, a bond is created between the connect1 ATOM of the previous residue and the connect0 atom of the new one. As atoms are read from the ATOM records, their coordinates are written into the correspondingly named ATOMs within the residue being built. If the entire residue is read and it is found that ATOM coordinates are missing, then external coordinates are built from the internal coordinates that were defined in the matching UNIT (residue) variable. This allows LEaP to build coordinates for hydrogens and lone pairs which are not specified in PDB files.

Now, let's go back to the beginning and assume everything was set up properly; the distraction above was simply to give you a little insight to what goes on behind the scenes... Remember, using software like it is a black box is dangerous, especially in research.

We are now ready to load up the pdb file we created previously. In order for xleap to "figure out" where one strand terminates and the next begins it is first necessary to add pdb TER cards to the nuc.pdb file between the strands since nucgen unfortunately doesn't do this. In fact it is always a good idea to check your pdb files have TER cards between every DNA strand or amino acid chain. The modified pdb file is nuc_ter.pdb. In this file the TER card has been added between the two lines shown in blue:

ATOM 229 H3T A3 10 -0.808 -8.873 31.720
TER
ATOM 230 H5T T5 11 4.562 7.653 32.500

Now we load the modified pdb into xleap as a new unit called model:

model = loadpdb "nuc_ter.pdb"

Hopefully this time there should be no errors with just the following informational message being printed:

Loading PDB file: ./nuc_ter.pdb
  total atoms in file: 438
  Leap added 200 missing atoms according to residue templates:
       200 H / lone pairs

From this message you can see that xleap has automatically added the missing hydrogens for us in accordance with the pre-defined templates. To look at the newly created "unit" named "model" type:

edit model

This should cause the unit editor window to pop up as shown below:

This time it should be clear that there are no non-bonded atoms present and that residues 10 and 11 have been successfully identified this time.

To create our first prmtop and inpcrd files, we simply issue the following command in the main xleap window:

saveamberparm model polyAT_vac.prmtop polyAT_vac.inpcrd

This should give you the following output (the warning concerns the fact that we did not neutralise our system - more on this later):

> saveamberparm model polyAT_vac.prmtop polyAT_vac.inpcrd
Checking Unit.
WARNING: The unperturbed charge of the unit: -18.000000 is not zero.

 -- ignoring the warning.

Building topology.
Building atom parameters.
Building bond parameters.
Building angle parameters.
Building proper torsion parameters.
Building improper torsion parameters.
 total 110 improper torsions applied
Building H-Bond parameters.
Not Marking per-residue atom chain types.
Marking per-residue atom chain types.
 (no restraints)

and create the following two files:

polyAT_vac.prmtop

polyAT_vac.inpcrd

To remind you about these files:

prmtop: The parameter/topology file. This defines the connectivity and parameters for our current model. This information is static, or in other words, it doesn't change during the simulation. The prmtop we created above is called polyAT_vac.prmtop.

inpcrd: The coordinates (and optionally box coordinates and velocities). This is data is not static and changes during the simulations (although the file is unaltered). Above we created an initial set of coordinates called polyAT_vac.inpcrd.

Now we want to create a topology that has explicit net neutralising counterions. There are a number of different ways to add ions to a structure. In this example we shall use the addions command implemented in xleap. This method works by constructing a Coulombic potential on a 1.0 angstrom grid and then placing counterions one at a time at the points of lowest/highest electrostatic potential. The command to do this is as follows (the '0' means 'neutralise'):

addions model Na+ 0

This should add a total of 18 sodium anions to counteract the -18 charge of the DNA chain. The output from this command should be similar to the following:

> addions model Na+ 0
18 Na+ ions required to neutralize.
Adding 18 counter ions to "model" using 1A grid
Grid extends from solute vdw + 1.87  to  7.97
Resolution:      1.00 Angstrom.
grid build: 0 sec
 (no solvent present)
Calculating grid charges
charges: 0 sec
Placed Na+ in model at (-0.73, 10.83, 18.51).
Placed Na+ in model at (6.27, -8.17, 18.51).
Placed Na+ in model at (10.27, 4.83, 11.51).
Placed Na+ in model at (-6.73, -8.17, 11.51).
Placed Na+ in model at (-5.73, 3.83, 13.51).
Placed Na+ in model at (-9.73, 3.83, 25.51).
Placed Na+ in model at (6.27, -8.17, 5.51).
Placed Na+ in model at (-5.73, 8.83, 1.51).
Placed Na+ in model at (11.27, 1.83, 25.51).
Placed Na+ in model at (-10.73, -4.17, 28.51).
Placed Na+ in model at (5.27, 3.83, 3.51).
Placed Na+ in model at (2.27, -6.17, 27.51).
Placed Na+ in model at (-10.73, -3.17, 8.51).
Placed Na+ in model at (6.27, 7.83, 15.51).
Placed Na+ in model at (10.27, -3.17, 8.51).
Placed Na+ in model at (-1.73, -12.17, 16.51).
Placed Na+ in model at (-9.73, 3.83, 4.51).
Placed Na+ in model at (-7.73, 8.83, 21.51).

Done adding ions.

Note: you should always check that the number of ions you were expecting have actually been added. It is also a good idea to view the new structure to ensure that the charges have been placed as intended. By editing the "model" we can see where the ions have been added:

edit model

Now we are once again ready to write the prmtop and inpcrd files, this time for our neutralised system:

saveamberparm model polyAT_cio.prmtop polyAT_cio.inpcrd

Output files: polyAT_cio.prmtop, polyAT_cio.inpcrd

The final input files to create are for solvated DNA with explicit counterions. We have our "model" unit already built with counterions so the next step is to solvate it with explicit water. This is done with the command "solvatebox". For our DNA, we will put an 8 angstrom buffer of TIP3P water around the DNA in each direction. In this way all atoms in the DNA starting structure will be no less than 8 angstroms from the edge of the water box. Before we do this, however, for reasons that will become clear later we should create a copy of our model and call it model2:

model2 = copy model

The following creates a rectangular box of water around the DNA (note: if you are using AMBER7 use WATBOX216 in place of TIP3PBOX):

solvatebox model TIP3PBOX 8.0

This results in the following output (exact numbers may be slightly different due to round off differences between different computer architectures), editing the "model" (edit model) should show you the DNA in a water box:

> solvatebox model TIP3PBOX 8.0
  Solute vdw bounding box:              25.736 26.736 40.240
  Total bounding box for atom centers:  41.736 42.736 56.240
  Solvent unit box:                     18.774 18.774 18.774
  Total vdw box size:                   44.537 45.963 58.910 angstroms.
  Volume: 120593.276 A^3 
  Total mass 53972.060 amu,  Density 0.743 g/cc
  Added 2638 residues.

edit model

The above output shows us that xleap added a total of 2638 water molecules to form a rectangular box of 44.5 x 46.0 x 58.9 angstroms (120593.3 angstroms³). This is not cubic since DNA is a cylindrical molecule. An issue here is that the long axis of DNA could rotate (via self diffusion) such that the long axis was along the short box dimension which will, since this box will be infinitely repeated in space by the periodic boundary method, bring the ends of the DNA near their periodic images. One way to get around this would be to make the box cubic, or 58.9 x 58.9 x 58.9 angstroms, by specifying a list of numbers to the solvateBox command to force this to be cubic. However, this will add significantly more water to the calculation and slow it down tremendously. Alternatively we can use a different shape box of water. While a rectangular box is the obvious choice for tessellating in 3 dimensional space it is not the only shape that can be replicated in 3 dimensions. A more efficient shape to use, in terms of reducing the problem of solute rotation, and the one we will be using for this tutorial, is a truncated octahedron:

To add a truncated octahedral box of water around our DNA we use the solvateoct command. Since in the course of this demonstration we have already solvated our "model" with a rectangular box of water we shall use the copy we made "model2". Enter the following in xleap to create the water box (note: if you are using AMBER7 use WATBOX216 in place of TIP3 PBOX):

solvateoct model2 TIP3PBOX 8.0

This should give the following output:

> solvateoct model2 TIP3PBOX 8.0
Scaling up box by a factor of 1.320477 to meet diagonal cut criterion
  Solute vdw bounding box:              24.918 26.588 39.153
  Total bounding box for atom centers:  60.281 60.281 60.281
      (box expansion for 'iso' is  65.4%)
  Solvent unit box:                     18.774 18.774 18.774
  Volume: 115065.025 A^3 (oct)
  Total mass 59917.340 amu,  Density 0.865 g/cc
  Added 2968 residues.

Editing "model2" allows you to view the truncated octahedron water box:

edit model2

There you have it, a truncated octahedral shaped ice cube...

Once again we save our AMBER parmtop and inpcrd files:

saveamberparm model2 polyAT_wat.prmtop polyAT_wat.inpcrd

Output files: polyAT_wat.prmtop, polyAT_wat.inpcrd

Now we have our input files we can progress to the next section which introduces running minimisation and molecular dynamics..

CLICK HERE TO GO TO SECTION 3