Previous ToC Up Next

3. Implementation: Clop Entry Points

3.1. A New Driver

Alice: Hi Bob! How's it going with your attempt to implement the option block idea?

Bob: Well, it took me quite a bit longer than I thought, but that was mainly because I got more and more ideas for further improvements, while I was writing the parser. And now I'm really hooked to the use of a scripting language! It was wonderful to see how easy it was to add functionality, and to change prototype behavior on the fly, just to try out various options.

Alice: Ah, I can see that your parsing code grew as a result.

Bob: Yes, but not as much as I would have expected. Compared to my much simpler parser, the length grew by only a factor four, and even that is not a fair comparison at all, since I used a piece of canned magic before, by adding

 require "getoptlong"                                                         

Who knows how long that code is. In contrast, this time I wrote everything myself, and the functionality is vastly increased.

Alice: Can you walk me through the code, to show me the new magic?

Bob: I'm glad to do so! Let me just follow the flow of control, right from the beginning.

In my new driver, rkn2.rb, I now start as follows:

 require "rknbody.rb"
 require "clop.rb"

You see, I'm including the Body and Nbody classes, as before, in the first line, and I'm including the new parser that I wrote, which lives in clop.rb. You will appreciate the modularity: In clop.rb there is no knowledge about N-body systems; in fact, a chemist or a biologist could use clop.rb equally well for completely different purposes.

Alice: Hear, hear!

Bob: I thought you would like that. Now following those two lines, there appears this one long `here document' that we already wrote before, containing the full list of option blocks. Then, the only thing left in this file rkn2.rb, are the following lines:

 parse_command_line(options_definition_string)                                
 
 include Math
 
 nb = Nbody.new
 nb.simple_read
 nb.evolve($method, $eps, $dt, $dt_dia, $dt_out, $dt_end, $init_out, $x_flag) 

So that is all there is to it! This is the whole driver. It contains a two lines at the start to specify what needs to be included, a few lines at the end, the rest is one long list of option blocks in a single `here document'. And all the work is done in the file clop.rb that contains the parser.

3.2. Invoking the Parser

Alice: So the last three lines are almost the same as the last three lines in your first attempt at parsing the command line, in file rkn1.rb:

 nb = Nbody.new                                                               
 nb.simple_read                                                               
 nb.evolve(method, eps, dt, dt_dia, dt_out, dt_end, init_out, x_flag)         

The only difference is that all the parameters of the call to evolve are now global variables, if I remember Ruby's convention correctly.

Bob: Yes, in Ruby, a variable starting with a dollar sign is by definition a global variable. Normally I would not like to use global variables, but here it seemed like a natural way to get the information from the parse file clop.rb back into our driver. The alternative would have been to turn each variable into a method that interrogates the class Clop that is hiding inside clop.rb. Instead of using the global variable $dt, we could define an instance my_clop.dt, and so on.

You might argue that these variables are what the user is providing for a particular run, and while the run is running, these variables contain the only information to the program available from the outside world; all other information is local to the program. So to use global variables may even be natural.

Alice: I agree that this is one of the few places where global variables seem like a reasonable solution. Although I don't like them in general, I also don't like to stick to literally to any principle, even the principle `thou shalt not use global variables'.

Bob: Another meta principle, not to stick to any principle?

Alice: Watch out, if you apply that to itself, you may get into a paradox!

Bob: Like the question ``who shaves the barber?'' if the barber shaves everyone who doesn't shave himself. But let's not get into that.

Alice: Now all the magic occurs because of the one call

 parse_command_line(options_definition_string)                                

I see that you take the one humongous `here document' string, and feed it into this method, that must be defined inside the file clop.rb.

Bob: Indeed! Time to open that file, and to show you what is going on. Instead of going through it from beginning to end, let me walk through the file, following the flow of the logic, starting with the method that is called here.

Alice: I'm all ears and eyes!

3.3. The Clop Class

Bob: The file clop.rb contains three things: there is the definition of a class called Clop, in front of that there is the definition of a helper calls Clop_Option, and after that there is a very short piece, namely the following three line definition:

 def parse_command_line(def_str)
   Clop.new(def_str, ARGV)
 end

Alice: That's all that happens, in order to parse the command line? This method just creates a new instances of the class Clop, and that's it?

Bob: That's it. Note that two essential pieces of information are passed to that new instance. The first argument contains the string with the whole list of option blocks, that was defined in the driver. That was the one and only argument passed from the driver to the clop.rb file. The second argument is ARGV, the array that contains the command line, broken up in space separated pieces.

Alice: So that is very similar to C.

Bob: Yes, except that ARGV[0] is already the first argument to the program, not the program name, as a C programmer might expect. So if you give a command:

  ruby test.rb -x -o out_file
then ARGV[0] = "-x", ARGV[1] = "-o", and ARGV[2] = "out_file". Effectively what has happened is that the piece of the command line that follows the program name is treated as a string, on which the command split is run. In the above case, when we call the remainder of the command line, after test.rb, str, then ARGV is the same as the array a that we would obtain from the statement:

  a = str.split
The split command splits the one string str into an array of smaller strings, where blank spaces function as separators defining the extent of each smaller string.

Alice: So the logic here is that you create a new instance of the class Clop, and you give it all the information that it needs: the `here document' that contains the complete interface information of our N-body code, and the ARGV array that contains the full information of what the user wrote on the command line. And somehow everything else happens as a side effect of creating this new Clop instance.

Bob: Yes. I did it that way so that you don't have to bother anymore later on about Clop classes. You just create one, and then you can already discard it, since upon creation it has done all its work. Let me show how.

Alice: By the way, why the name Clop ?

Bob: Ah, I should have mentioned that. Clop stands for Command Line Option Parser.

Alice: I should have guessed.

Bob: A class name Command_Line_Option_Parser just sounded a bit too long for my taste. On the other hand, feel free to change the name that way, if you like. In true object-oriented and modular way, the name of the class is not visible to the user. Instead, the user just gives the command

 parse_command_line(options_definition_string)                                

Still, as you know, I prefer more terse names, hence Clop.

3.4. Creating a Clop Instance

Alice: So what happens when you create a new Clop instance?

Bob: Here is the initializer for the Clop class:

   def initialize(def_str, argv_array = nil)
     parse_option_definitions(def_str)                                        
     if argv_array
       parse_command_line_options(argv_array)
     end
     print_values
   end

Alice: It indeed seems to do all the work required: first it parses all the definitions from the option block list from the N-body driver, then it parses all the options given on the command line.

Bob: And finally it echoes all the values that it gives back to the driver. Some of the values will be specified by the user. Other values, not specified by the user, will retain their default value. By echoing the whole set, the user will know exactly how the N-body integration got started, with what set of initial parameters.

Alice: But where does this method give those values back to the driver?

Bob: Ah, global variables, remember? Nothing is passed back explicitly. It is just made visibly globally. That's why the driver could simply give the command:

 nb.evolve($method, $eps, $dt, $dt_dia, $dt_out, $dt_end, $init_out, $x_flag) 

Alice: Ah, yes, of course. One more advantage of global variables. Once you have decided to go that dangerous path, you might as well enjoy it.

Okay, the logic is still crystal clear, so far. Let us start with the first command. How do you parse the option definitions?

3.5. Parsing Option Definitions: the Idea

Bob: Before showing you the method, let me first explain the idea. The full list with all the option blocks is contained in the single string def_str. What we would like to do is to cut up this list in two steps. The first logical step would be to divide the full string into shorter strings, one for each option. The second logical step would be to split each option string into lines, so that you can parse the meaning of each line.

Now a more practical approach would be to reverse the order. It is much easier to split the original def_str string immediately into individual lines. You can do that with the split method we just talked about: by default it cuts up a string wherever a blank space appears, but if you give it an argument, such as a newline \n, it will cut the string wherever in encounters the symbol specified in the argument.

In other words, the command

     a = def_str.split("\n")                                                  

will produce an array of single lines, that together make up the original list of option blocks.

So now we have to go back to the step we skipped: we have to stitch the lines together that belong to a single option. To do this, we hand the whole array of lines to another method, which is so friendly as to take off enough lines from the array as are needed to reconstruct a single option block. That friendly method then passes back that single option, as a nice package, while leaving all the unrelated lines on the array of lines. After each call to this method, the line of arrays shrinks, until the whole array has been eaten up, and we are left with a stack of package, one for each option.

Now what I call a package is -- you guessed it -- an instance of a new class, called Clop_Option. It is a helper class, used by the Clop class, to wrap up all the information for a single option. The Clop class itself contains an array of instances of Clop_Option.

Alice: Just like an N-body system is represented by an instance of the class Nbody, which contains an array of instances of the Body class, one instance for each particle.

3.6. Parsing Option Definitions: the Method

Bob: Exactly. And here is the method that parses the option definitions.

   def parse_option_definitions(def_str)
     a = def_str.split("\n")                                                  
     @options=[]
     while a[0]
       if a[0] =~ /^\s*$/                                                     
         a.shift                                                              
       else
         @options.push(Clop_Option.new(a))                                    
       end
     end
   end

What I just described as the friendly method that wraps related lines into a single package is nothing else but . . . the initializer for the Clop_Option class! I use the same approach that we started with, one level lower. On the top level all the parsing work, for all options, was done as a side effect of creating a single Clop instance. On this level here, the parsing work for a single option is done as a side effect of creating an instance of the Clop_Option class.

Alice: All very clear. So you create an empty array of options, called @options, an instance variable within the Clop class. As long as there is anything left on the array of single lines, you traverse the while loop. Only when a[0] = nil, in other words when the array of lines has been picked empty of lines, and nothing is left anymore, do you end your work.

Now within the while loop, whenever you encounter a line that is completely blank, you discard it. That is what the lines

       if a[0] =~ /^\s*$/                                                     
         a.shift                                                              

mean, right?

Bob: Right. The regular expression indicates lines that contain zero or more blanks, between begin and end of a line. The symbol \s stands for any type of white space, such as a single space or a tab. The symbol ^ at the beginning of a regular expression /^.../ means the beginning of a line, while the symbol $ means the end of a line. The symbol * as usual means zero or more instances of the previous symbol, so \s* means any number of spaces or tabs, possibly zero.

Taken together, the regular expression /^\s*$/ corresponds to any line that looks blank to the eye, whether it is a null string "" or a string with a few blanks like " " or a string containing tabs as well, like " \t \t\t ". Now whenever such a line is encountered, the array method shift is called in the second line above, which simply discards the first element of the array. As a result, the new element a[0] now contains what used to be stored in a[1], a[1] contains what used to be in a[2], and so on. The array consequently has become one element less in length.

Alice: And as soon as a non-blank line is encountered, you create a new instance of Clop.Option.

Bob: Yes, and I give the line array a as an argument to the of Clop.Option. This is the friendly function that gobbles up as many lines as needed to complete a well wrapped single option.

Alice: Ah, this means that it stops when it encounters two blank lines.

Bob: Yes, since we had agreed that that would be the sign that would separate two different option blocks. But the Clop.Option initializer is even friendlier than that: it also stops when there is something wrong with the syntax of the option that it is trying to wrap up. It doesn't just wrap any random bunch of double-blank-line-separated stuff.

So we can be assured that when Clop_Option.new returns, we have a valid new option package, in the form of a new instance of the Clop.Option class, and we can safely add that to the array of options called @options, using the command

         @options.push(Clop_Option.new(a))                                    

Alice: Okay, I get it! In a moment, I would like to see how Clop.Option does its work, but for now, let us assume it knows what it is doing, and let us look at the second action that the initializer for Clop itself is performing.

3.7. Parsing Command Line Options: the Idea

Bob: Again, let me lay out the logic first. After the definitions of all options have been read in and parsed, it is time to see which options the user has actually specified, and to take the corresponding actions, such as modifying the default values of the appropriate global variables, or providing help of one type or another, as the case may be.

So there are two steps to the process of parsing the command line options: first make an inventory of options specified, and then take the appropriate actions. If a help request is encountered in the first step, the second step consists of printing out the corresponding help message(s). If no help is requested, the second step consists of initializing the proper global variables.

The first step is carried out in a loop. At the beginning of the loop, the first element of the ARGV array is examined. Depending on the option found, the correct action is taken. For example, if an option is found that does not require a value, this option is assumed to be a boolean variable, in other words a flag. Such a flag is by default set to be false, but when the option is encountered, the value of the flag is set to be true. If an option does require a value, another element is taken from the ARGV array, and properly interpreted.

This last element can be a bit complex, since some values may be spread over several elements of the ARGV array. For example, if a vector is specified, through -v [ 1, 2, 3 ], several elements from the array have to be parsed until the closing ] symbol is encountered.

Alice: But you could have required the reader to put the whole vector into a string, as follows: -v "[ 1, 2, 3 ]".

Bob: Yes, and that is also a legal option. However, I wanted to make the parser really general, and I also wanted to free the reader from thinking about such aspects as how the command line would be parsed. In the spirit of Ruby, I prefer to download as much of the complexity of the interface to the code behind the interface, keeping the interface itself as natural as can be. Rather than training the user to add those double quotes, I would rather train the computer to figure out what to do even without quotes.

Alice: And as long as you insist that every vector starts with an opening square bracket and ends with a closed square bracket, there is no ambiguity.

Bob: Exactly. Ambiguity would be impossible to correct, of course. But as long as everything is unambiguous, I prefer the parser to do the hard work.

Now all of what I have just mentioned is still part of step one. Step two is more straightforward: You just ask each option to initialize its own global variable. And here you don't care whether such an option still has its default value, or whether that value has been modified through a command line option that was just read in.

Alice: Okay, got it! Let me see how you coded this.

3.8. Parsing Command Line Options: the Method

Bob: Here is the actual method:

   def parse_command_line_options(argv_array)
     while s = argv_array.shift
       if s == "-h"
         parse_help(argv_array, false)
         exit
       elsif s == "--help"
         parse_help(argv_array, true)
         exit
       elsif i = find_option(s)
         parse_option(i, s, argv_array)
       else
         raise "\n  option \"#{s}\" not recognized; try \"-h\" or \"--help\"\n"
       end
     end
     initialize_global_variables
   end

As long as there is something left in the array that contains all the command line bits and pieces, you take the next piece, call it s, and inspect that string. Now there are four possibilities. It could be a request for short help, in the form of a -h string; or it could be a request for long help, in the form of a --help string; or it could be the beginning of a regular option; or none of these three. In the last case, an error is reported, and the program is halted. The command raise prints the string that follows it, and stops execution of the code.

Alice: The call to find_option takes only one argument, while parse_option takes three arguments. Why is that?

Bob: The string s should contain one or two hyphens, followed by the name of the option, and that unique name is enough to determine which option we are dealing with. Therefore find_option takes only one argument, namely s, and returns the number of the option, i, which is simply the index of the option in the array of options. Remember that Clop has an instance variable @options for the option array, and the number i just means that we are dealing with option @options[i].

However, knowing which option we have just encountered is not enough to completely parse the information for that option. In general, the next element in the ARGV array will contain the new value for that option. And as I just mentioned, in the case of a vector value, that value may be distributed over an unknown number of further ARGV array elements. Therefore the call to parse_option needs to receive both i and argv_array.

Alice: Yes, but why do you give it s as well? Haven't you squeezed all the information out of it by finding out which option it refers to? If s = "-d" or s = "--step_size", there is no need to pass that string s on to parse_option.

Bob: Ah, you are completely right in those both cases. But there are other cases!

Alice: I can see that you are proud of having find a clever solution for something. But for what? There are only two cases for any option; either it is a one-letter option, starting with a single hyphen; or it is a multi-letter option, starting with a two hyphens

Bob: Right.

Alice: Right? So, then why pass it on?

Bob: Imagine that a user wants to set up a three-body system, and tries to give that option as -n3 . . .

Alice: . . . instead of the more proper -n 3. I see. Yes, that makes sense. I like that! It is another example where you could have trained the user to always leave a blank space between an option and a value, but why do that? Better let the computer figure it out. And in that case, of course parse_option needs to have access to the string s, just in case not all the information has been squeezed out of it. It may still contain the value of the option.

Bob: Right! Of course, this only applies to one-letter options. In this case, too, we cannot allow ambiguity. An option specification like -n3 is unambiguous, but writing --number_of_particles3 would be confusion. It could refer to a boolean flag with a name number_of_particles3. An unlikely name in this case, but there are other option names that could naturally take a number, such as --high5 or --loveU2, which may or may not be defined as boolean. So I only allow leaving out a space in the case of one-letter options.

Alice: And finally you initialize all global variables through a call to initialize_global_variables. No arguments needed, since the variables are global, and we deal with all of them. I like the long names you have chosen for your methods. That really helps in following the flow of the logic!

3.9. Printing Values

Bob: Thanks! Now let us go back to the initializer for the Clop class, where all the action started. Let me show it again:

   def initialize(def_str, argv_array = nil)
     parse_option_definitions(def_str)                                        
     if argv_array
       parse_command_line_options(argv_array)
     end
     print_values
   end

We have now seen, in outline, how the options definitions are parsed, until the definition string def_str has been eaten up, and how the command line options are parsed, until the array containing command line fragments, argv_array, has been digested. All that is left to do at that point is to print the values and, you guessed it, that is done with the method print_values:

   def print_values
     @options.each{|x| STDERR.print x.to_s}
   end

You see, this is a very simple method: it just gives an order to each option to print its own value. Remember, we want the output of each program to start with a list of values used, to remind the user what the initial state is that the program starts out with.

Alice: And the actual work is done through print_value, which must be a method associated with the Clop_option class.

Bob: Exactly. It is time that we look at that class as well. Here we have reached the end of the top level tour.

Alice: Thank you! Now I see clearly how you have laid out the program. Indeed: time to open some of the black boxes that you have mentioned so far.

Bob: Yes, these boxes were left in the dark so far. Now let there be light!
Previous ToC Up Next