Previous ToC Up Next

4. The First Journey: Clop, the Non-help Part

4.1. Three Journeys

Alice: I have enjoyed getting a bird's eye view of your clop.rb file. Let's get a little closer to the ground now. Where shall we swoop down?

Bob: I suggest that we continue our tour on the level of the Clop class, before descending all the way to the internal workings of the individual options, the machinery of which is contained in the Clop_option class.

However, more than halve of the Clop class code lines are dedicated to the help facility. It is not necessary to look at these lines in order to understand how normal options are being parsed. So I suggest that we continue our tour in three easy journeys. First we inspect how a normal option is handled on the Clop level. Second, we descend to the Clop_option level, to see how the corresponding option block is parsed and used. Third, we go back to the Clop level in order to figure out how the help facilities works.

Alice: Sounds good to me!

Bob: The first journey is by far the simplest, and shortest. Of the three actions ordered in the Clop initializer:

   def initialize(def_str, argv_array = nil)
     parse_option_definitions(def_str)                                        
     if argv_array
       parse_command_line_options(argv_array)
     end
     print_values
   end

we have already seen how the first action parse_option_definitions consisted in handing all the work to the initializer one level lower, through a call to Clop_Option.new. So that part will be visited in our second journey.

Similarly, we have seen that the request for the third action also was handed down directly to the individual options on the Clop_Option level. All we have to do in our first journey is to figure out how the method parse_command_line_options works.

4.2. Inspecting find_option

Alice: Can you show me this method again?

Bob: Here it is:

   def parse_command_line_options(argv_array)
     while s = argv_array.shift
       if s == "-h"
         parse_help(argv_array, false)
         exit
       elsif s == "--help"
         parse_help(argv_array, true)
         exit
       elsif i = find_option(s)
         parse_option(i, s, argv_array)
       else
         raise "\n  option \"#{s}\" not recognized; try \"-h\" or \"--help\"\n"
       end
     end
     initialize_global_variables
   end

The first two if and elsif branches concern the help facility, which we will address in our third journey. So we only have to inspect the following three methods here, during our first journey: find_option and parse_option and initialize_global_variables.

Here is the first one:

   def find_option(s)
     i = nil
     @options.each_index do |x|
       i = x if s == @options[x].longname
       if @options[x].shortname
         i = x if s =~ Regexp.new(@options[x].shortname) and $` == ""         
       end
     end
     return i
   end

Alice: The top part is clear. You hand it a string that contains something like "-d" or "--step_size". I presume that the option class Clop_option has a method longname that returns exactly the string "--step_size" and a method shortname that similarly returns "-d".

Bob: Well presumed!

Alice: Now if the option is recognized as the long name version of option i in the option array, the value i is returned, as it should be. But what happens with the short name?

Ah, wait, before you answer my question, let me think. This must be connected with the fact that you allow for short options to be glued to their values. For example "-d0.001" would be a valid format.

Bob: Indeed, even though a user would not be likely to write it that way, since it does look a bit confusing. However, if we allow "-n3", we should allow "-d0.001" as well.

Alice: Agreed. So I understand that you want to check only whether the -d part is present in the string s, while that string is allowed to contain more. Now you do that by turning the shortname of the option into a regular expression.

Bob: Yes: if you want to compare two strings, the proper and clean way to do so in Ruby is to change the string at the right-hand side into a regular expression. This is like converting a integer into a floating point number. In a way, nothing changes, except that now it has become an instance of a different class. For the number, an Int instance has become a Float instance, and here in our case, a String instance has become a Regexp instance.

Alice: and the comparison operator =~ returns true if @options[x].shortname is indeed contained in the string s.

Bob: Yes, except that it returns the position of the first character of the match, rather than true. But what concerns us here is that it does not return nil, which would be interpreted sa false; anything that is not nil or false is considered to be true. Even the null string "" is true in Ruby, another thing to watch out for if you are a C programmer.

Alice: And a more logical use of the notion of true, if you ask me. A non-null string string is still more than nothing.

Bob: Yes, I agree, though it took me a while to shake the C habit.

4.3. The Last Cryptic Bit

Alice: Now I think I understand all about this find_option method, except for that last cryptic bit, and $` == "". What is that doing there? And what does it do?

Bob: Ah, that is a nice addition, if I may say so myself. At first I had not put that in, but when I looked at this method, without that addition, I had the feeling that something wasn't right. When I thought about it, I realized that there was still a possibility for ambiguity.

Alice: Like?

Bob: Like having an option with a long form --number_of_particles and a short form -n. Can you see what would happen in that case?

Alice: Let me inspect. Ah! Yes, of course. In the case of the long form, you still match correctly against -n, as the second and third character of the long form. How devious!

But wait a minute. If you first check the long form, you could bypass the check for the short form, by turning the two if statements into an if...else statement.

Bob: Yes, that would work in the specific case I just mentioned, where there is only a confusion between the two ways of writing the same option. But what if there is a possible confusion between two different options?

Here is an example. Let there be another option with a long form called --neutron_star_type. Now that option, too, matches -n. So we have to protect different options from each other, and we cannot assume safety just by shadowing the short option check by the long option check.

Alice: You are right! But I still don't understand the syntax of your solution. I would have checked whether the match started at the beginning. Didn't you say that the match attempt returns the position of the first character of a successful match?

Bob: Indeed. And you are right. I could have written

        i = x if (s =~ Regexp.new(@options[x].shortname)) == 0
However, I preferred to use the $` variable. After every successful match, the matched part of the string is assigned to the variable $&, while the part of the string before the match is assigned to $` and the part of the string following the match to $'. So I just checked whether $` was equal to the empty string:

         i = x if s =~ Regexp.new(@options[x].shortname) and $` == ""         

Alice: I see. That is good to know. I guess those rather cryptic shorthands are borrowed from Perl.

Bob: I think so.

Alice: Okay, I now fully understand how find_option. On to the next station of our first journey!

4.4. Inspecting parse_option

Bob: Here is the next station. After we know which option we are dealing with, we have to parse it. This happens in the following method:

   def parse_option(i, s, argv_array)
     if @options[i].type == "bool"
       @options[i].valuestring = "true"
       return
     end
     if s =~ /^-[^-]/ and (value = $') =~ /\w/                                
       @options[i].valuestring = value                                        
     else
       unless @options[i].valuestring = argv_array.shift                      
         raise "\n  option \"#{s}\" requires a value, but no value given;\n" +
               "  option description: #{@options[i].description}\n"           
       end
     end
     if @options[i].type =~ /^float\s*vector$/                                
       while (@options[i].valuestring !~ /\]/)                                
         @options[i].valuestring += " " + argv_array.shift                    
       end
     end
   end

Now this is a bit more complicated, since there are several forks in the road. The first fork is related to the question: is the type of the option boolean? In other words: are we dealing with a flag? A flag can only be true or false. By name the flag as a command line option, the user intends to set the flag, i.e., to the value true. By leaving out that option, the user intends to keep the default value false.

For example, in our N-body code, the user can ask for extra diagnostics by including the option -x, which leads to the corresponding global variable $x_flag as we have specified already. By default $x_flag = false. If the option -x is encountered, we have to change this variable to $x_flag = true.

This happens by setting the valuestring of the boolean option to true as you can see at the beginning of the code fragment above.

Alice: This valuestring is probably implemented as a string @valuestring within the Clop_option class, and there that string is used later to obtain the actual value?

Bob: All correct, as we will see during our second journey, but you don't have to rely on that, on this level: it could have been implemented in a different way, as far as the Clop class is concerned. The only important thing is that there is a `setter' method provided for the Clop_option class, that somehow sets the internal information of the Clop_option instance in such a way as to guarantee that the boolean value of the option, when asked for later, will return true.

Hmm, that sounded more complicated than it really is. Often things are much clearer on the code level than when you try to express it in words.

Alice: The same is true in mathematical equations, of course, once you understand all the symbols . . .

Bob: . . . and once you are sufficiently familiar with manipulating the symbols that they are becoming old friends.

Alice: Yes, until that point it is still helpful to have clumsy sentences in a natural language to help you get the idea. So, please continue to be clumsy, and tell me what happens next. We have encountered a fork in the road. It the option is boolean, we set it to true without needing to read anything more from the command line, and we happily return.

Bob: And if the option is not boolean, we take the other fork in the road, by continuing the travel through the method parse_option.

4.5. Extracting the Value: Normal Case

Alice: Ah, I see, if the type of the option is not boolean, you have to extract the value from the next little bit of command line information, by accessing arg_array. But wait a minute, I see two lines where you assign something to @options[i].valuestring, no, three lines; one at the very bottom too.

Ah, that last one deals with vectors, and you already explained that vectors are special, in that their value can be spread out over different bits of string in the command line. So let's leave that for later. But what about these two assignments of @options[i].valuestring right in the middle?

Bob: The main assignment, the one you should look at first, is this one:

       unless @options[i].valuestring = argv_array.shift                      
         raise "\n  option \"#{s}\" requires a value, but no value given;\n" +
               "  option description: #{@options[i].description}\n"           

In most cases, after encountering a new option name, you just read in the value corresponding to that option, as the next little string that came from the command line. If there is nothing left to be parsed on the command line, that just means that the user has forgotten to provide a value: an error message is printed, and execution of the code is halted.

Alice: But what happens if the user provides a next option, instead of the value for the previous option? Imagine that the user writes -n -x.

Bob: In that case, an attempt will be made to set the number of particles to -x, which will result in something silly. But hey, we can't protect the user from all possible errors! I don't know how to anticipate on this level what is and is not correct. Others, using this code in the future, will undoubtedly use it for more general purposes than I can currently envision, so I don't want to constrain too much what can and cannot be said.

Alice: Hmmm. You could at least insist that a valid number would be provided when the type of a variable is given as an int or float.

Bob: Perhaps. We could come back to those questions later, and try to make everything industrial-strength. For the time being, I'm happy if everything works under reasonably normal circumstances with reasonably intelligent users.

Alice: Well, if you talk about users that don't make errors, then I have to conclude that nobody fits the criterion of being `reasonably intelligent'. But okay, for now let's move on. I'd probably want to come back to this point later, though.

4.6. Extracting the Value: Compact Case

Bob: Now if you look just above the two lines I quoted above, you find:

     if s =~ /^-[^-]/ and (value = $') =~ /\w/                                
       @options[i].valuestring = value                                        

This addresses the case where a one-character option is used, without any space separating the option and the value, as in -n3, a very compact notation which we already discussed before.

Alice: What is the meaning of this funny looking repetition of the symbols ^-? They occur twice, with a square bracket in between, and a closing bracket at the end, as ^-[^-].

Bob: This is one of the most confusing aspects in the notation of regular expressions, this overloading of the meaning of the up-arrow ^. In fact, the two up-arrows here are two completely different things. In order to see this, let us inspect the whole regular expression:

            /^-[^-]/
The first ^ specifies the beginning of the string. The presence of - immediately following means that the string has to start with a - sign. Now the square brackets are normally used to give you a choice, as in [aei] or [a-f]. In [aei] it is understood that any of the three letters a or e or i could be present and still form a match. And in [a-f], any letter in the range a, b, c, . . , f would form a valid match.

Alice: Yes, that notation I am familiar with. But how can you start at the beginning of a line for the second time.

Bob: You don't. Within square brackets, the up-arrow ^ has the effect of negating the meaning of the next character. So the combination [^-] simply means: any character but the - character!

In other words, by writing

    if s =~ /^-[^-]/
we ask whether it is true that the string s begins with a hyphen, but does not begin with two consecutive hyphens. Let me show you:

  |gravity> irb
  irb(main):001:0> "-n" =~ /^-[^-]/
  => 0
  irb(main):002:0> "--nono" =~ /^-[^-]/
  => nil
Alice: Ah, very nice, though difficult to parse for a human like me.

Bob: You'll get used to it.

4.7. Interesting or Confusing?

Alice: Now that I understand the first half of the first line, let me stare at both lines again:

     if s =~ /^-[^-]/ and (value = $') =~ /\w/                                
       @options[i].valuestring = value                                        

You have told me that the variable $' contains the rest of the string, the part after the part which matched. So if we start with the option "-n", and if we insist that it should start with one and only one hyphen, then $' = n, right?

Bob: Wrong.

Alice: Huh?

Bob: Try it!

Alice: Okay:

  |gravity> irb
  irb(main):001:0> s = "-n"
  => "-n"
  irb(main):002:0> s =~ /^-[^-]/
  => 0
  irb(main):003:0> $'
  => ""
Hey, that is strange! Why should it be the empty string? What happened to n ?

Bob: Why don't you try the compact option-value notation -n3

Alice: Here goes:

  irb(main):004:0> s = "-n3"
  => "-n3"
  irb(main):005:0> s =~ /^-[^-]/
  => 0
  irb(main):006:0> $'
  => "3"
Somehow the n gets eaten up and disappears without a trace, but the 3 survives.

Bob: What happened is that the matching attempt s =~ /^-[^-]/ involves two characters: first the hyphen and then the next character, for which it is checked that it is not a hyphen.

Alice: Ah, although in plain English we can describe this match as `a check that there is one and only one hyphen', in fact it is a match where the first two characters are being checked as being an ordered pair `hyphen followed by non-hyphen.'

Now I see what happened. And since this all happens in the case of a one-character option, the non-hyphen that gets eaten is the option character, so that what is left is exactly the value that needs to be assigned to the variable corresponding to the option.

So what you do at the end of this complicated line, is that you check whether the remainder, stored in $' contains at least one alphanumeric character or underscore, which is what the \w stands for.

Bob: Exactly.

Alice: Okay, I see now what happens. But I think you could have written this in a simpler way.

Bob: How?

Alice: Instead of

    if s =~ /^-[^-]/ and (value = $') =~ /\w/
you could have used

    if s =~ /^-\w/ and (value = $') =~ /\w/
Bob: Ah, I had not thought about that. I guess I was just to fixated on hyphenation! But, now that I figured out how to do it, I find my double hat trick, or double up arrow if you like, quite elegant. Or at least interesting.

Alice: I just find it confusing, rather than interesting, but to each his own taste! Let's move on to the last case, at the end.

4.8. Extracting the Value: Vector Case

Bob: This is a lot simpler. Here we are dealing with the case that the option type is that of a float vector, a vector of the type we have defined before, with components that are all floating point numbers. As I already mentioned, a vector on the command line should be given in Ruby array notation, with the numbers enclosed between square brackets, [].

There is a lot of freedom for the user: the vector can be written as a string, like "[3, 5]", or without those double quotes directly as [3, 5]. The numbers can be comma separated, but they can also just be space separated, as in [3 5]. Spaces are allowed next to the brackets: [ 3 5] and [3 5 ] and [ 3 5 ] are all equally fine.

There is one catch to be aware of, when you leave of the double quotes: on the command line [ 3,5] and [3, 5] and [3,5 ] are all fine, but [3,5] is likely to give you an error message.

Alice: Why?

Bob: It depends on the Unix shell you use, but chances are that the shell tries to interpret this as an attempt to address files in the current directory. Unless you happen to have a file with the name 3 or a file with the name 5, and expression on the command line containing [3,5] will probably generated a short dry message No match.

Alice: That's good: short and simple, and it makes it clear that there is no subtle Ruby bug involved.

As for your implementation, let me look at what you wrote for vector parsing:

     if @options[i].type =~ /^float\s*vector$/                                
       while (@options[i].valuestring !~ /\]/)                                
         @options[i].valuestring += " " + argv_array.shift                    

You allow some flexibility in writing the type: it could be float vector or float vector or even float vector.

Bob: Sure, it would seem to restricted to insist on one literal way of writing it. I can easily see someone adding an extra space between the two words, and perhaps a tab or whatever would strike them as looking better. I have consistently given the users that freedom, also in parsing the lines within the Clop_Option class, as we will see in our next journey.

Alice: And then you keep shifting new content from the ARGV array until you finally encounter a string that contains a closing square bracket ]. During that whole process, you keep adding what you find to the valuestring of the option you are working with, so that you build up the whole vector again, from the bits and pieces from the command line that were stored in successive elements of the ARGV array, here called argv_array.

One last question: why don't you just string those strings together? What is the need for adding a " " between the bits and pieces?

Bob: If all the vector elements were comma separated, as in [2,3,4], there would be no need to do so. However, I give the user the flexibility to use a space separated notation as well. Take the example of a vector written as [2 3]. In the ARGV array, this will be distributed over two elements, the first being "[2" and the second one "3]". Now if you would just string those two strings together, as you suggested, you would get "[23]", a one-dimensional vector with one element, 23. Not what you wanted.

Alice: I see. Good! Now I believe there is one station left on our first journey?

4.9. Inspecting initialize_global_variables

Bob: Indeed. Here is the last method:

   def initialize_global_variables
     @options.each{|x| x.initialize_global_variable}
     check_required_options
   end    

I am asking each option to do its own work, initializing the internal variables that contain the external information presented on the command line. For example, the time step value may have been presented on the command line as -d 0.001, and in order to give that value to the method evolve that we have used to integrate an N-body system, we have to store it somewhere in a variable, which we would normally called dt or something like that.

As we have seen in our new driver, for convenience we chose to use global variables, which means the name actually has to be something like $dt. Now the conversion from the string -d 0.001 to the actual floating point numerical value $dt = 0.001 is being done within the Clop_Option class, by its method initialize_global_variable.

Alice: And then you check something about the options. What do you mean with `required'?

Bob: For many options, a default value is specified, in the original definition string. However, for some options there is no natural default. For example, you may not want to specify a default value for an output file. Not only is there no obviously appropriate name for such a file, you don't want to risk overwriting another file that the user may have added to the current directory. In that case, you could start the entry for the output file option in the original here document with:

  Short name:           -o
  Long name:            --output_file_name
  Value type:           string
  Default value:        none
The convention used here is that an error message will be generated if the user does not provide a specific name for the output file on the command line. And the method check_required_options checks that all the original none specifications have indeed been overwritten by values provided on the command line.

Alice: Instead of check_whether_all_required_options_have_been_provided, you abbreviated the method name. Even for me an eight-word name would have been too long. Can you show me the code?

Bob: Here it is, all very straightforward.

   def check_required_options
     options_missing = 0
     @options.each do |x|
       if x.valuestring == "none"
         options_missing += 1
         STDERR.print "option "
         STDERR.print "\"#{x.shortname}\" or " if x.shortname
         STDERR.print "\"#{x.longname}\" required.  "
         STDERR.print "Description:\n#{x.longdescription}\n"
       end
     end
     if options_missing > 0
       STDERR.print "Please provide the required command line option"
       STDERR.print "s" if options_missing > 1
       STDERR.print ".\n"
       exit(1)
     end
   end

Alice: All clear! I think this finishes our first journey?

Bob: Yes, time to finally descend into the Clop_Option class.
Previous ToC Up Next