Writing an input file: the basics

What is an input file?

An input file begins with an open curly bracket { and ends with a close curly bracket, }.

In between these brackets, it consists of "words" and "numbers" separated by blank spaces and possibly blank lines. One blank space is equivalent to any number of blank spaces or blank lines.

The allowed "words" may be "keywords" which tell the computer to "do" something, or they may actually be pieces of "data" like "numbers" which are to be stored.

What does an input file look like?

Below is an example showing the style of keywords and data used in TONTO.

Note how data is usually grouped within matching curly brackets; and note that keywords ending in the = sign are followed by some numerical or string-like data.

Note also that repetive lists of data are inputted using a list= { ... } construction. Preceding these repetitive lists of data there is often a list_order= { ... } keyword list; this keyword list specifies the order in which the data items in the list are to be interpreted.

{
  name=     nh3
 
  charge=  0
 
  multiplicity= 1

 
  crystal= {
 
    spacegroup_it_symbol=   P2_13
    unit_cell_angles=      90.0   90.0   90.0       Degree
    unit_cell_dimensions=   5.1305 5.1305 5.1305    Angstrom
 
    thermal_smearing_model= stewart
    partition_model= mulliken
 
    reflection_data=  {
   
      ! Experimental data from Boese et al ...
 
      list_order= { h= k= l= F_exp= F_sigma= }
   
      list= {

      ! These are the real experimental data

       1   1   0    18.093    0.118
       1   1   1    63.470    0.446
       0   2   0    53.079    0.434
       1   2   0     2.864    0.084

      }
    } 
  }

   
  atoms= {

    list_order= 
       { label= " { axis_system= crystal } " pos= basis_label= }

    list= {

    N  0.2103001   0.2103001   0.2103001  N_basis_set
    H  0.3722001   0.2627001   0.1113001  H_basis_set
    H  0.1113001   0.3722001   0.2627001  H_basis_set
    H  0.2627001   0.1113001   0.3722001  H_basis_set

    } 
  }
  
  
  basis_sets= {

    list_order= { gamess-us= }

    list= {

    N_basis_set
    {
       S   3
        1        30.63310000         0.1119060000     
        2        7.026140000         0.9216660000     
        3        2.112050000        -0.2569190000E-02 
       P   3
        1        30.63310000         0.3831190000E-01
        2        7.026140000         0.2374030000    
        3        2.112050000         0.8175920000    
       D   1
        1       0.913000000           1.00000000    
    } 
  
    H_basis_set
    {
       S   1
        1       0.3258400000         1.000000000    
       P   1
        1       0.750000000          1.00000000    
    } 

    }
  }
}

How do I know which are the allowed input keywords?

If TONTO does not understand a keyword, it usually gives an error message and sufficient information to track down the error.

To find out what keywords are allowed for any module XXXX, look in the chapter called Keyword documentation for TONTO> in the subsection "The XXXX module".

For example, we might start by looking in the "The MOL module" section. We find as a typical example the following line:

name= STR

The name of the molecule 

• The name= keyword must be the first one in any input file. 
• The value of the inputted string is used to define the start of archive
file names, so do not use any spaces in it.

In the case above, the keyword is name=. You can type this into the input file. Following the name= keyword you must type a piece of data which is a STR, i.e. a string variable such as "nh3".

Following this keyword is a description of what the keyword might do, if you type it in the input file. In this case, name= does not seem to "do" anything at all, but seems to represent "The name of the molecule". In smaller text still are some special comments about the usage of this keyword.

Some keywords must be followed by data---perhaps a number or a string. These are called "data keywords". These are used to input data into the program. Data keywords always end in an = sign, so you know that some data has to follow them. There are also "task keywords" which may or may not be followed by data. Task keywords do not end in an = sign. They are used to perform a specific action or calculation. The "name=" keyword is obviously a data keyword.

Note that there cannot be any space between a keyword and the = symbol.

There are two special keyword, which are the open and close braces, { and } respectively. They are always used to signify the beginning and end of input for a list of data---perhaps a list of numbers, or a list of keywords.

Note that braces { and } must have a space on both sides.

What about keywords which aren't in this manual?

You will need to look in the online code documentation.

This is explained in the chapter called Keyword documentation for TONTO> in the Section called The online documentation in the chapter called Keyword documentation for TONTO> "Online documentation". A schematic procedure for finding out what keywords are available is given there, and what data the keywords require as input.

Since TONTO changes quite often, this may often be your only alternative.

How do I know when to enter data?

If data is required to follow a particular keyword, in the documentation you will see immediately following the keyword a double colon, and some uppercase words, such as STR, REAL, INT, REALVEC(3), or ATOM.

In the example above, the name= data keyword is followed by STR. indicating it must be followed by some data, and that this data must be of the STR variety. STR is a shorthand for "string data".

What kinds of data are there?

There are many kinds of data. You have already encountered STR data. There is also REAL data, which represents "real numbers", INT data which represents "integers", and REALVEC(3) data, which represent 3 dimensional vectors, to name a few.

However, there are also more complicated pieces of input data, like ATOM data, which (naturally enough) represents "atom data". Atom data might be represented by a collection of simple data, For example, there may be a STR piece of information representing the "atom label" , and there may be a REALVEC(3) piece of data representing the "atom position". There is also "molecule data", or MOL data for short. This might be composed of a "list" or "vector" of ATOM data. These complicated data objects are called "derived data", or "vectors of derived data".

All data in TONTO is described by modules. Thus, ATOM data is described by a module with name ATOM. Table 1> describes each of the modules (and hence data types) which are used in TONTO.

Table 1. Modules available in TONTO.

Module nameDescription
ARCHIVEFor archiving objects (mainly matrices) to disk
ATOMA quantum mechanical atom
ATOMVECA vector of ATOM's
BASISQuantum mechanical basis sets
BASISVECA vector of BASISsets
BINMATA matrix of BIN
BINVECA vector of BIN
BUFFERA string buffer
CPXMATComplex matrices
CPXMAT33 dimensional complex matrices
CPXMAT44 dimensional complex matrices
CPXMAT55 dimensional complex matrices
COLOURConverts colour names to RGB triples
COLOURFUNCTIONGenerates RGB triples from function values
CRYSTALA crystal
CPXVECComplex vectors
REALDouble precision numbers
DFTGRIDDFT integration grids
FILEBINARY (unformatted) files
GAUSSIANA gaussian function
GAUSSIAN2A pair of gaussian functions
GAUSSIAN4A quartet of gaussian functions
INTMATInteger matrices
INTMAT33 dimensional integer matrices
INTMAT44 dimensional integer matrices
INTIntegers
IRREPPoint group irreps
IRREPVECA vector of point group irreps
INTVECInteger vectors
IVECMAT3A 3 dimensional matrix of integer vectors
IVECVECA vector of integer vectors
MARCHINGCUBEGenerates triangulated iso-surfaces using the marching cubes method
REALMATReal Matrices
REALMAT33 dimensional matrices
REALMAT3VECA vector of 3 dimensional matrices
REALMAT44 dimensional matrices
REALMAT4VECA vector of 4 dimensional matrices
REALMAT55 dimensional matrices
REALMATVECA vector of matrices
MOLA chemical molecule
OPMATRIXOperator matrices (restricted, unrestricted, complex, etc.)
OPVECTORDiagonals of operator matrices
PLOTGRIDRectilinear grids for plots
POINTGROUPSymmetry pointgroups
REFLECTIONA single reflection (scattering data) from a crystal
REFLECTIONSA vector of reflections
RYSRys roots and weights for electron repulsion integrals
SCFDATASCF convergence data and results
SHELLA contracted shell of gaussian functions
SHELL1A contracted shell of gaussian functions, with a position
SHELLVECA vector of SHELL's
SHELL2A SHELLpair. Contains integral code
SHELL4A SHELLquartet. Contains integral code
SHELLPAIRA pair of SHELLs
SHELLQUARTETA quartet of SHELLs
SHELL1QUARTETA quartet of SHELL1s. Contains heavily optimised integral code
SPACEGROUPCrystal spacegroup symmetry
STRCharacter strings
STRVECCharacter strings
SYSTEMSystem level routines
TEXTFILEA file containing ASCII text
TIMECurrent and elapsed time
TYPESDefines the various types in TONTO
UNITNUMBERInformation about files currently open
VECDIISA vector of DIIS
REALVECA real vector
REALVECVECA vector of REALVECs

How do I enter data?

Simply type the required piece of data immediately following the data keyword---remembering to leave at least one blank space between the keyword and the data.

For the kind of "simple data", such as STR, REAL, INT, or REALVEC. you can probably guess what you have to type in your input file.

Examples for entering simple data are given in Table 2>. This is explained further in a section below.

The simplest way to enter derived data like ATOM data is to use keywords---and finding and using keywords has already been explained above. If the keyword is to be followed by data, it is either simple data (which is described below), or derived data (which can be inputted using keywords, already described above).

There is also another way to enter vectors of derived data, without using keywords, which is also explained below.

Table 2. Shorthand symbols for simple data with input examples

Kind of dataShorthand symbolSome examples of how to enter this data
Comment-! A comment appears after an isolated exclamation
"!" Even this is a comment
# A hash will also begin a comment
LogicalBINTRUE true F Yes "NO" False F f
StringSTRa-string-with-no-blanks "a string with blanks"
IntegerINT123 -10 +10 "666"
Real numberREAL123 -10.0 123.4 123.4e5 +123.4d-5
Complex numberCPX12.3e5 56.7e8 ! A single complex number

Table 3. Shorthand symbols for simple vector data with input examples

Kind of dataShorthand symbolSome examples of how to enter this data
Logical vectorBINVEC(4)T T T"f"A logical vector, length 4
BINVEC*{ T T T }A logical vector, variable length
String vectorSTRVEC(2)"Hi" gorgeousA string vector, length 2
STRVEC*{ a b c d }A string vector, variables length
Integer vectorINTVEC(5)1 0 1 1 0An integer vector, length 5
INTVEC*{ 66 99 33 }An integer vector, variable length
Real vectorREALVEC(3)0 0.0 30.d-3A real vector, length 3
REALVEC*{ 1. 2. 3. }A real vector, variable length
String vector pairSTRVEC(3),STRVEC(3)

x_1 y_1
x_2 y_2
x_3 y_3

Vector x in column 1
Vector y in column 2
Real vector, String vector pairREALVEC(3),STRVEC(3)

1. y_1
2. y_2
3. y_3

Real vector in column 1
String vector in column 2
Complex vectorCPXVEC(1)0.0 1.0A complex vector, length 1
CPXVEC*{ 1. 2. 3 4 }A complex vector, variable length

Table 4. Shorthand symbols for simple matrix data with input examples

Kind of dataShorthand symbolSome examples of how to enter this data
String matrixSTRMAT(2,2)

a b
"d"

A string matrix, rank 2 x 2

by_cloumn
a c
"d"

Same matrix, entered by column
Integer matrixINTMAT(2,2)

1 2
3 "4"

An integer matrix, rank 2 x 2

by_column
1 3
2 "4"

Same matrix, entered by column
Real matrixREALMAT(2,2)

1 2.
3 4d+5

A real matrix, rank 2 x 2

by_column
1 3
2. 4d+5

Same matrix, entered by column

How do I enter simple data?

Some examples of how to input simple kinds of data are shown in Table 2 in the Section called How do I enter data?>, Table 3 in the Section called How do I enter data?> and Table 4 in the Section called How do I enter data?>. Some examples of input comments are also shown.

In most cases, the input that you type is the same as that used for the Fortran language, except in the following respects related to the entry of vectors and arrays of data.

Sometimes it is necessary to enter vector data where the size of the vector is not known by the program, beforehand. When entering these vector data with variable length, curly brackets { and } are used to enclose the vector or list.

Sometimes, it is also convenient to enter a pair (or more) of vectors with the same length so that one alternates between the lists. This is called interleaved vector input. One represents this alternating sequence of vectors with a comma between the different types of vector data. For example, entering two 3 dimensional vectors x and y in the sequence x1 y1 x2 y2 x3 y3 would be represented by the data type REALVEC(3),REALVEC(3).

Finally, when entering two dimensional matrix data, there is always the question of whether one should enter the data across rows, or by columns. The default is to read by rows, unless the matrix is preceded by a string by_column or column-wise. The row order can also be explicitly forced by preceding the matrix with the string by_row or row-wise. Multidimensional matrices are always entered in the Fortran order, by_column. That is, so the first index of the matrix is incrementing most rapidly.

How do I change units with simple data?

For any simple numerical data, simply append the new units string. For example, entering:

1.3 angstrom

would cause TONTO to interpret the number !1.3! in Angstrom units, and TONTO would convert the number internally into default units, which are atomic units. This also applies to vectors and matrices of fixed length. For example,

1.0 2.0 3.0 angstrom

would represent REALVEC(3) object in Angstrom units. To see the list of allowed units, look at the is_known_unit routine in the STR module.

How do I enter derived data?

Any derived data is composed of a collection of simpler pieces of data, including, possibly, simpler kinds of derived data. Any one of these simpler pieces of data which make up the derived data is called a component. For example, for ATOM data, there may be a STR piece of information representing the "atom label" , and there may be a REALVEC(3) piece of data representing the "atom position". Both of these are components.

One way to enter derived data component information is to use keywords, which has been explained above. [1]

Another way to enter derived data component data is to type the data for each component in order, without any keywords. For example, for ATOM just discussed, first we could type a STR, representing the atom label, then a REALVEC(3), representing the atom position, like this:

oxygen-atom   0.0  0.0  0.0

But, we could equally well convey the same information by typing this:

0.0  0.0  0.0 oxygen-atom

Clearly, it is important to know the correct order for the pieces of information. The correct order for an ATOM in the "Input documentation for module ATOM", under the heading "Standard input data order". It is usually the first thing that is mentioned. This style of input is called "plain style input". [2]

Note: Plain style input is never used to input derived data. It is only used to input arrays of derived data (although, keyword style input can also be used for arrays of derived data, if desired). Plain style input is used for arrays of derived data in order to save typing keywords for inputting long lists of data.

How do I enter lists of derived data?

The way to input a list of plain style data is to begin with the { symbol indicating that derived data is to follow. Thereafter, a list of plain style data is entered, as described above. This list is terminated by the } symbol indicating the enf of the variable-length list. For example, to enter a sequence of three ATOM pieces of data---which is ATOMVEC data---we would type:

{
   oxygen-atom   0.0  0.0  0.0
   N             1.0  0.0  0.0
   carbon        0.0  1.0  0.0
}

The indentation in the above example is not required (nor are line breaks), but indentation is advised to improve readability.

Note: If XXXX is the name of a data type, then XXXXVEC is the name of the list or vector of those data types.

Altering the input order for lists of derived data

Sometimes, it is convenient to alter the order in which the components of plain data are inputted---for example, to read some other programs data, or to enter extra pieces of data. This can be done using the list_order= keyword.

Following the list_order= keyword is a list of allowed keywords which specifies the new input data order to be used for the plain data. As usual, this new list of keywords is enclosed by curly brackets { and }. For example, if we wanted to enter the positions of the atoms before their atom labels, use the following input:

{
   list_order= { pos=           label= }
                 0.0  0.0  0.0  oxygen-atom
                 1.0  0.0  0.0  N
                 0.0  1.0  0.0  carbon
}

We could also enter the above data as follows, by explicitly labelling the listed data with an enclosing list= { ... } descriptor which has been implied in the above example:

{
   list_order= { pos=           label= }
   list= {
                 0.0  0.0  0.0  oxygen-atom
                 1.0  0.0  0.0  N
                 0.0  1.0  0.0  carbon
   }
}

Embedded commands in lists of derived data

The list of keywords in a list_order= statement need not be data keywords: commands or embedded data can also be placed in the list. This is useful, for example, for specifying operations on the data while the list of data is being processed.

For example, we may wish to tell the program that the position coordinates are in Angström units. When entering ATOM data. This can be done by prefacing the pos= keyword by "{ units= angstrom }" (see the units= keyword in module ATOM---it is usually available in every module). Thus, to change the units for the entire list of ATOM data in the example above, type:

{
list_order= { "{ units= angstrom }" pos= label= }
   0.0  0.0  0.0   oxygen-atom
   1.0  0.0  0.0   N
   0.0  1.0  0.0   carbon
}

Note that "{ units= angstrom }" must be enclosed in double quotes to ensure it is interpreted as a single unit. Likewise, curly brackets must be used because the contents of the quoted string are interpreted as derived data, which must always begin and end with curly brackets.

Note: The curly brackets must always appear as separate characters surrounded by blank space; or the curly bracket must be at the beginning or end of a string.

If in the above example only the units= were to appear in the data_order section, then since the units= keyword must be followed by a STR unit identifier, this string must appear as the first data element within the plain data list. That is, we would have to type:

{
   list_order= { units= pos= label= }
      angstrom 0.0  0.0  0.0   oxygen-atom
      angstrom 1.0  0.0  0.0   N
      angstrom 0.0  1.0  0.0   carbon
}

This would defeat somewhat the purpose of using the plain data style, since the unit specifier angstrom must be repeated. Likewise, since the units of each position can be changed using a post-facto units identifier, typing the following input would also have the same effect

{
   list_order= { pos= label= }
      0.0  0.0  0.0 angstrom  oxygen-atom
      1.0  0.0  0.0 angstrom  N
      0.0  1.0  0.0 angstrom  carbon
}

In both cases the repetition of the "angstrom" string is rather tedious.

Notes

[1]

This component information is stored in type component variables. Usually, the names of these variable are the same as, or similar to the names of the keywords used to input them. It is good programming practice ensure that this is the case

[2]

It should be pointed out that derived data, such as ATOM data, may contain extra pieces of information which are not inputted. For example, ATOM data contains an "atomic number" represented by an INT variable. This atomic number is not inputted, but can be worked out from the "atom label"---provided the label contains a string which clearly identifies which kind of atom it is.