Beautiful Code One Screen At A Time, Part II

Last week I talked about code being beautiful at the level of individual lines of code in small procedures. But what about longer procedures?

There are a couple of standard rules of thumb for how what a procedure should do, and how long it should be: it should do just one thing, and it should be no bigger than will fit on the screen at once. And note, these dicta date back to when the typical screen was 80 characters high by 24 characters wide.

My own rules are somewhat different. A procedure should do a set of things that are always or frequently done together; and it should be as long as it needs to be to do those things. And then, if it contains segments that are
used in more than one place, those segments should be pulled out as individual procedures.

This means that my routines are often longer than some people’s; but they still have to be understandable. The technique I use is called FIRST/NEXT commenting. I first ran into this technique on a large simulation project back in the late 1980’s (a project in which some of the individual routines were amazingly, absurdly long) and I’ve used it ever since.

Here’s an example; it’s a routine from Quill that packages up a library as a .zip file:

# BuildLibZip lib
#
# lib   - The name of the library
#
# Creates a teapot.txt file for the library, and then packages
# it into /.quill/teapot/* for later installation.

proc BuildLibZip {lib} {
    # FIRST, make sure the lib is known.
    if {$lib ni [project provide names]} {
        throw FATAL "No such library is provided in project.quill: \"$lib\""
    }
    
    # NEXT, save the teapot.txt file.
    set teapotTxt [outdent "
        Package          $lib [project version]
        Meta entrykeep 
        Meta included    *
        Meta platform    tcl
    "]

    writefile [project root lib $lib teapot.txt] $teapotTxt

    # NEXT, make sure the output directory exists.
    set outdir [project root .quill teapot]
    file mkdir $outdir

    # NEXT, prepare the packaging command
    set command ""
    lappend command [env pathto teapot-pkg] generate \
        -t zip                                       \
        -o $outdir                                   \
        [project root lib $lib]

    # NEXT, call the command
    set outfile [file join $outdir package-$lib-[project version]-tcl.zip]
    puts "\nBuilding lib $lib:"
    puts [eval exec $command]
}

The FIRST/NEXT structure is used in the main body of the routine; and if there are loops or if statements with large bodies, it’s used within those blocks of code as well. Here are a couple of things to note about it:

Given any typical syntax coloring scheme, the comments stand out visually and divide the procedure body into segments.
The “FIRST” and “NEXT” keywords make it clear that the comments do not stand alone; rather, each comment is part of an intelligible sequence.
This is useful to the reader, but it’s also a useful reminder to the author that he’s telling a story. When done properly, the FIRST/NEXT comments are effectively the pseudo-code for the procedure.
Each comments heads a block of a few lines of code: rarely as few as just one, and (except for loops and conditionals) rarely more than four or five. Those lines of code will all be related by the comment that precedes, which doesn’t simply mark a point in the procedure; rather, it explains what the next few lines are supposed to accomplish.
Where the block following consists of many individual lines, this is usually because the lines are all very similar to one another: we are doing the same sort of thing many times over.

Remember that the whole point of this is intelligibility. When I go look at this routine, I immediately see (before I read a word) that it consists of five major points; and that to some extent these points (though linked sequentially) are independent. And then, by reading the comments I can easily jump right to the block I’m interested in.

Some style guides I’ve read (style guides written back when programs were rather shorter than they are now) would suggest that I break this routine up into five subroutines with long, descriptive names. With, in fact, names rather similar to the gist of the FIRST/NEXT comments. Then, the theory went, the BuildLibZip routine would just contain the five subroutine calls, and no comments, and would be “self-documenting”.

Which is well and good; but instead of having one code entity to navigate (a procedure) I’d suddenly have six procedures, each with a header comment; and each of which could in principle be called independently. Instead of having one unit of code with a complex but easily navigable structure, I’d have six independent units of code to keep track of. Since the five subroutines are only called by BuildLibZip, I’d have to make it visually clear in some way that BuildLibZip and its subroutines form a single logical procedure.

And that is, in fact, what I’ve done by making BuildLibZip a single logical procedure with FIRST/NEXT comments.

Beautiful Code One Screen at a Time, Part II