Fortran Code Modernization and Speedup
There are plenty of good Fortran applications written long ago, but
to use the latest compilers it is likely that you will need to upgrade the source to remove
obsolete language features. Generally this means you must update the source to at least the Fortran 1995 standard.
Code Modernization
Fortran modernization can be involve:
- Convert to a 64-bit application, which allows for using more memory.
- Convert "common blocks" to more modern "modules".
- Use or clean up the use of PARAMETER statements, also using a module.
- Dynamically allocate arrays.
- Convert old-style DO loops into the more modern DO/END DO pairs, and replace the obsolete shared end-of-DO statements with their own DO ENDs.
- Dynamically allocate remaining (non-module) arrays.
- Fix bugs that were previously uncaught until exposed by the upgrade. For example, modernizing the memory usage can expose "array index out of bounds" bugs.
- Clean up source code indentation for DO loops and IF-THEN-ELSE groups to improve readability.
- Miscellaneous manual cleanup that occurs on an app-specific upgrade case.
The Fortran code modernization process is tedious and time-consuming, so we have developed a number of tools to automate most of the above tasks.
Code Speedup
Once the source code has been cleaned up and we have an app to use as a performance base, there are several ways to improve performance:
- If the app was already memory-constrained, give it more memory to work with.
- Enable SIMD ("Single Instruction Multiple Data") compiler options so the compiler will generate code that can operate on multiple array items with a single instruction.
- Enable multi-core/mult-thread handling in the heaviest computational loops using OpenMP. Any CPU these days has from 2 to over 100 "cores".
If you have millions of particles, molecules, or cells, you can distribute the calculations across those cores, giving you a major performance boost.
Profiling tools can be used to identify the heaviest offenders, the source studied for parallelization feasibility, and where feasible turn it into a
parallel loop by adding some simple OpenMP directives.
- Enable multi-system hetergenous calcaluations. This means implementing OpenMPI, which allows the calculations to be distributed across
mulitple computers. This technology only applies to truely huge computational problems, so we need to look at the particular application to see if this
is likely to be benefcial.
- Enable GPU support. While the fanciest CPU these days has over 100 cores, a high-end GPU has more than 10,000!
GPU cores are slower than CPU cores, but by their sheer volume can usually get the job done faster.
GPU support can be enabled in the Fortran world by using OpenACC, which is similar to OpenMP.
It is possible to get even lower level control on an NVIDIA GPU using their CUDA API, although it is a bit more work in Fortran than in C/C++.
- Use multiple GPUs. This approach can sometimes be used depending on how the data points (e.g., particles) interact with each other.
If there is frequent interaction, the extra data management time across the GPUs can negate the benefit of the extra horsepower.
How fast can your application become? Sorry, we can't answer that question!
In general, though, you can probably get a speedup of 2x or 3x using multiple cores, and 10x to 20x using a single GPU.
Please contact us if you have a legacy Fortran project you would like to upgrade, or if you have an already modernized
project that you'd like to speed up.