Next: 7.2 A Standard -Body Up: 7. A General -Body Previous: 7. A General -Body

7.1 A Wish List

In the first part of this book, we have offered a few quick-and-dirty programs that work fine for initial explorations. We were able to study stability aspects of various 2-body and 3-body systems, both with respect to numerical instabilities as well as physical instabilities. Many more situations could be easily explored, using these programs, and we hope that the readers will have tried their hand at some other configurations, starting from different initial conditions, for 2 or 3 or more bodies. Starting from hermite2.C, for example, it is easy to change the value of n in the first line, and to replace the explicit assignment of positions and velocities by other values for all st n particles.

However, it quickly becomes tedious to have to change the program, each time we want to integrate from a different starting position. Also, there are many other improvements that can be made to the code, as anyone with even modest programming experience will have noticed. In this second part of our book, we will begin to add more structure. Once we have structured our codes in a more modular and flexible way, we are in a position to carry out some real research projects with astrophysical implications. While simulating some real stars systems, we will soon realize that we will have to extend the complexity of our codes. While we will make the switch to variable time steps already in this chapter, we will later find a need to assigning individual time steps to each star. Also, we will introduce special coordinate patches for interacting group of stars. We will take these two steps in later volumes in our book series.

Here is a quick overview of a wish list for improving the structure of our computer codes. We will address some of them fully in this book, while we leave other items partly or completely to other volumes.

comments: So far, we have not included any comments in our codes. In an attempt to keep the codes short and uncluttered, we wanted to show the flow of the statements directly, given that most codes could fit on one or two pages. But when the codes became longer, it was high time to put in comments, and we will do so from now on.
functions: So far, we have written each program as a single function call to main(), without trying to split up the program into smaller pieces. For the first few codes, this was fine, and it kept everything light weight. But for the later codes, spanning a few pages, it would have been better to start dividing the functionality over separate functions. For example, in leapfrog2.C, we calculate the accelerations early on in the code, and then in the same way at the end of the main integration loop. Putting those statements in a function, and calling that function once before the loop and once in the loop makes the code both easier to understand and easier to debug. In addition, it is likely that we will use such a function in other codes as well.
structured I/O: In our examples so far we have used only a very rudimentary form of I/O (input/output). We wrote our results in the form of a list of positions and velocities to the standard output stream, and we wrote some energy diagnostics to the error output stream. And we used input only interactively, to prompt the user to provide a few parameters. It is much better to define a unique -body data format, which includes other variables besides ${\bf r}$ and ${\bf v}$ , such as masses, time, and perhaps additional information. Once we write the results from an integration into a file, we can then read in that file again when we want to continue that run. This leads us to:
pipes: The notion of pipes in Unix allows one to redirect the output of one program as the input of another program (`piping' the results from one program to the other, as it is called). For example, it would be nice to pipe the results of a program generating initial conditions directly into an integrator, and to pipe the results of the latter into an analysis program.
command line arguments: Typing in parameters by hand, after being prompted by a program, gets tedious very soon. It is also inflexible, in that it doesn't fit very well if we want to write shell scripts to run a bunch of programs in laboratory fashion. A better way to pass parameters to a program is by providing arguments directly on the command line. Unix has a default protocol for doing this, and we adapt that usage in the following programs.
using a Makefile: When the number of files in our working directory grows, we may lose track of which program needs to be recompiled. To automate this process, we introduce the notion of a Makefile below. The real strength of Makefiles will become apparent only later, but already at this stage is can be helpful.
test facilities: Soon our codes will reach a level of complexity where it becomes difficult to convince ourselves that the code is really doing the right thing everywhere, giving the correct answers in the end. The best approach is to develop a slew of standard tests, together with a form of scaffolding that enables these tests to be run automatically, each time we make changes to our code.
using the C++ STL: So far, we have used only the bare bones part of the C++ language. In some of the programs below we will introduce a convenient extension to the C++ core language, in the form of the Standard Template Library (STL), which is included in every modern C++ compiler. It gives us a quick and well-debugged and often (but not always) efficient way to get standard tasks done quickly.
C++ classes: The central feature of C++, as an object-oriented language, is the use of classes, ways to encapsulate objects. Since we need to build up some considerable experience with -body codes in order to know what type of objects to construct, we postpone the introduction of classes until later in this book.
error checking: Any robust code will do lots of error checking. Ideally, every function should make sure that the data it gets fed are of a form that is valid for the operations that it wants to do on them. Since error checking, and even better, error handling (following up an error in the proper way, once it occurs) complicates a code considerably, we postpone this until somewhat later.
more flexible data format: As we discussed earlier, it would be nice to give each star considerable autonomy by building in some form of artificial intelligence to let stars decide when to do what and how to report on it. For this to work, a minimal requirement is a flexible way for reporting unforeseen events, and this requires considerable flexibility in the data formats used. We will later give an example later of `stories' that are attached to each star's data.
more flexible command line options: The Unix-based one-letter-only style of command line options that we introduce below is far from ideal. Later we will provide a more flexible way of handling arguments on the command line.
more detailed help facility: For now, asking for help will result only in a list of command line options, together with a brief indication of what they do. It would be better to provide several levels of help, allowing the user to get more detailed information when needed. This leads to:
documentation: At a minimum, a good software environment should have a manual page for each program. Even better, groups of programs should be described as to their purposes and the way they can work together. This leads to:
construction of a software environment: At some point, when we have written various integrators and a number of programs to generate initial conditions and to analyse data, it will become too much of a clutter to keep everything in a single directory. We will need to provide more structure for the way in which we store our tools, and the way we intend them to be used. This leads to:
multiple files: We mentioned under `functions' above the desirability to recycle code by creating functions that can be used for different applications. If we compile such a function in a separate file, it will be easier to link it to other codes that use it. This leads to:
libraries: An extension of the previous concept, in which a group of related functions is compiled into a library, which can then be linked to other codes that use some of the functions collected there. Having various libraries and many files requires significant bookkeeping to be done to guarantee that everything is consistent and up to date. This leads to:
version control: For a software environment under development (and every healthy environment is constantly under development!), it is useful to be able to reconstruct older versions, and to keep track of the latest developments. CVS, short for Concurrent Versions System, is a useful package for doing all this. It also allows several people to write code asynchronously within the same software environment, since it will flag any collisions stemming from potential multiple edits. More recent alternatives are available as well, such as SVN, short for Subversion, which allows more flexible ways to rename files and whole directory structuress.
autoconfig: A related useful facility is `autoconfig', which allows a user to install a software environment on an (almost) arbitrary platform, without any trouble. As the name implies, this program does an automated check to see how your particular system is configured, and it then sets up your copy of the software environment in such a way that it fits your environment.
parallelization: With most modern computers distributing the running of a time-intensive program over several processors, it is important to give guidance to the compiler as to how to break up a large program into chunks that can be executed safely in parallel. Later we will discuss how to modify our -body codes to make use of both small-scale and large-scale parallelism.
special-purpose hardware: Another way to greatly gain in speed is to use dedicated hardware, constructed specifically for the problem at hand. For the gravitational -body problem, the GRAPE hardware developed at the University of Tokyo, provide such a function. We will discuss issues connected with the use of one or more GRAPE boards.
a dedicated plotting package: The time will come that the use of a canned plotting package, like gnuplot, is just too inflexible for our particular needs in analyzing the results of -body runs. At some stage we will introduce a version of a plotting package, dedicated to the analysis of stellar dynamics simulations of dense stellar systems.
a scripting language: Around that time, if not earlier, the need will be felt for a scripting language that is more powerful than the simple use of shell scripts.
archiving: Finally, when we have an efficient and detailed software environment for doing cutting-edge scientific research, we will want to perform large-scale simulations. When some runs will take weeks or months to run on the world's fastest computers, it is important to have ways to store the massive amounts of data in such a way that we can later query those data in flexible and efficient ways. Archiving and data retrieval as well as more fancy operations like data mining then become serious issues.

Next: 7.2 A Standard -Body Up: 7. A General -Body Previous: 7. A General -Body

The Art of Computational Science
2004/01/25