- Building the supercell
- Running the job
- Atomic forces and symmetries
- Kernel truncation and linear scaling
We now turn to a calculation on a periodic solid. As it is fairly well-behaved but illustrates some interesting concepts, let's try crystalline silicon, in the diamond (f.c.c.) structure. We will build a 2 × 2 × 2 version of the 8-atom simple-cubic unit cell.
Copy your SiH4 input to a new le (eg Si64.dat) in a new directory (e.g. Silicon) and remove the references to hydrogen from the species and species_pot. Copy silicon.recpot to this directory as well. In the new input file, set the NGWF radius to 7.0 bohr, the number of NGWFs per atom to 4, and the cutoff energy to 600 eV. Edit the cell side length so that it is 2× the lattice parameter of crystalline silicon in the LDA (around 10.1667 bohr). For reasons that will become clear later set ngwf_cg_max_step to 8.0 in order to prevent the CG line step being capped unnecessarily and maxit_ngwf_cg to 30 in order to terminate the NGWF optimisation after 30 CG iterations (in case it's not converging).
Typing out the positions would be rather time-consuming and error-prone with 64 atoms in the cell, so use your favourite scripting/programming language (bash, awk, python, perl, etc...) would all be suitable to write a list of the positions. The 8-atom simple-cubic unit cell is obtained by repeating the primitive cell with two atoms ( at (0, 0, 0) and (1/4, 1/4, 1/4)*a ) at each of the positions of the f.c.c. lattice:
(1/2 , 1/2 , 0),
(0, 1/2 , 1/2),
(1/2 , 0, 1/2).
If possible, modify your submission script in order for the job to run on 16 cores in parallel. The calculation should take around 10-15 minutes. Feel free to stop it as soon as you see what is happening...
You should find that the calculation fails to converge. Indeed, the NGWFs RMS gradient remains stuck above the threshold for convergence. Likewise, the total energy will not converge to a fixed value.
Make a a backup copy of your output file and modify the species block in the input file. Change the NGWF radius to 8.0 bohr and the number of NGWFs per Si atom to 9. This introduces NGWFs with d-like symmetry rather than just s and p, allowing much more variational freedom. You should now find the calculation converges nicely, but will take rather longer to run (20-30 minutes on 16 cores).
Now try activating the calculation of the forces on each atom with the write_forces keyword and re-run the job. All the forces reported in the output file should be small. In principle they are constrained by the symmetry of the crystal to be exactly zero. However, you will see that they are not exactly zero because the symmetry of the system is broken by the psinc grid which is not necessarily commensurate with the primitive cell.
However, in this small cell, it will not be possible to fix this for the simulation cell and the FFT box coincide, and the number of grid points across the FFT box must be odd for efficiency reasons. Therefore, in small cells, the number of grid points across the simulation cell must also be odd.
Adjust your script to build a 5 × 5 × 5 supercell of the crystal (1000 atoms). Reduce the kernel cutoff to 25 bohr with kernel_cutoff and set the code to perform 1 NGWF iteration only with write_maxit_ngwf_cg (otherwise the calculation would take several hours). To restore the symmetry, set manually the spacing between psinc functions via the psinc_spacing keyword. The psinc spacing should be adjusted to be a divisor of the supercell length such that an exact number of grid points spans each primitive cell of the crystal. Remember to pick a value which gives an effective cutoff energy close to 600eV so as not to increase the run time too much. An example input file for this job can be found here. This should not take too long on 32 cores.
Density kernel truncation can be a good way to speed up the calculation for large systems. Asymptotically it is only by truncating the kernel that true linear-scaling behaviour of the computational effort will be observed. The kernel cutoff is controlled by the kernel_cutoff keyword. Remember to use the kernel truncation with caution as short kernel cutoffs introduce instabilities in the convergence process and a rather stringent limit on the calculation accuracy.
You may wish to try running a calculation on the 5 × 5 × 5 supercell (1000 atoms) and an 8 × 8 × 8 supercell (8000 atoms) for 1 iteration to see how the runtime scales. An example input file for the 8000 atoms job can be found here. Beyond around 500 atoms, the calculation should be into the so-called linear-scaling regime, so the 8000 atom calculation only takes a little over 8 times the 1000 atom calculation. This is rather better than the nearly 512 times longer it would take with traditional cubic-scaling DFT!