Ruby callbacks

If the Ruby programming language has a single weakness, for me it’s how ruby callbacks are defined.

In many programming languages, functions or methods can be declared outside of a specific class, passed around as variables, and invoked just as any other function or method anywhere they’re in scope. For example, Python allows such “top level” functions, and of course in C and C++ functions are just pointers and can be passed around and dereferenced as usual.

Ruby sort of does, and sort of doesn’t have callbacks. For example, it’s easy to define a ruby callback with either a proc or a lambda like so:

foo = -> { "top level lambda" }
bar = proc { "top level proc" }

describe "main" do
  it "prints top level foo" do
    expect(foo.()).to eq "top level lambda"
  end

  it "prints top level bar" do
    expect(bar.()).to eq "top level proc"
  end
end

But this has limited use, as the callback scope has to be controlled explicitly (and only) by file inclusion. Which is fine for C programming, but we’re programming in Ruby.

Dispatch table

We can also hash these callbacks to invoke later:

dispatch = {
  "foo" => foo,
  "bar" => bar
}

describe "main" do
  it "dereferences dispatch table and invokes 'foo'" do
    foo_from_table = dispatch['foo']
    expect(foo_from_table.()).to eq "top level lambda"
  end

  it "dereferences dispatch table and invokes 'foo'" do
    bar_from_table = dispatch['bar']
    expect(bar_from_table.()).to eq "top level proc"
  end
end

That’s all easy enough, but I find the

Controlling scope

module Foobar
  Foo = -> { "module lambda" }
end
dispatch['foom'] = Foobar::Foo

describe "main" do
  it "dereferences dispatch table and invokes 'Foobar::Foo'" do
    foom_from_table = dispatch['foom']
    expect(foom_from_table.()).to eq "module lambda"
  end
end

Practical implementation

All the above is well and good, but begs the question “Why would we want to use ruby callbacks?” Here’s one situation, overriding default execution:

#!/usr/bin/env ruby

require 'test/unit'

class CallbackDemo
  Default = -> { "default response" }

  def foo &bar
    bar ||= Default
    bar.call
  end
end

This implementation was inspired by Avdi Grimm’s Ruby Tapas screencast Caller-specified Fallback Handler. A slightly more in-depth treatment can be found in Avdi Grimm’s Confident Ruby.

Naturally, we want to test it:

class CallbackDemoTest < Test::Unit::TestCase
  def test_default_response
    assert_equal "default response", CallbackDemo.new.foo
  end

  def test_custom_response
    assert_equal "custom response",
      CallbackDemo.new.foo { "custom response" }
  end
end

It works with class methods as well:

class CallbackDemo
  class << self
    def bar &foo
      foo ||= Default
      foo.call
    end

    def quux(arg)
      arg ? yield(arg) : Default.call
    end
  end
end

class CallbackDemoTest < Test::Unit::TestCase
  def test_class_default_response
    assert_equal "default response", CallbackDemo.bar
  end

  def test_class_custom_response
    assert_equal "custom response",
      CallbackDemo.bar { "custom response" }
  end

  def test_argument_passing
    # Braced block can be passed to the assertion inline
    assert_equal "foobar response",
      CallbackDemo.quux("foobar") { |arg| "#{arg} response" }

    # do .. end block cannot be passed inline with the assertion;
    # it will produce a `nil` block:
    response = CallbackDemo.quux("foobar") do |arg|
      "#{arg} response"
    end
    assert_equal "foobar response", response
    assert_equal "default response", CallbackDemo.quux(nil)
  end
end

Recap

As usual, if I’ve written something unclear, feel free to leave a comment. Likewise if I’ve gotten something wrong. If you’ve written something you believe others would benefit from, feel free to leave a link as well.

Ruby guard clause FTW

A common Ruby programming idiom is writing “guard clauses” for immediate returns in lieu of nested if-elsif-elsif-elsif-end or case statements.

The Ruby guard clause idiom

The guard clause idiom construction looks like this:

def even? number
  return true if number % 2 == 0
  false
end

Before going any further, I know Ruby has the even? method for doing exactly this. And I could easily generate a guard clause example using specialized domain knowledge. However, the example provides code everyone can easily follow, and the discussion is supposed to be focused on guard clauses, not implementations of Ruby library methods. Just wanted to clear that up.

The first line of the method is often called a “guard clause” because of the syntactic pattern of returning from the method early, given some condition.

But technically, isn’t a “real” guard clause code for trapping invalid parameters or states on method entry?

In our code above, all the values of the argument number are assumed valid, even a nil argument. There is no “guarding” going on.

What is going on, instead, is control of flow. Plain old programming in fact.

The real beef seems to be related to a zombie meme left over from structured programming, as promoted by Dijkstra: never have more than one return from a function, and that return must be the last line of the function.

Our code above seriously violates Dijkstra’s dictum.

Isn’t that a bad thing?

In my opinion, no, it’s not, at least not for the code as written above. It’s a small method, just a few lines. Back when structured programming was adopted, functions commonly had hundreds of lines. Then it would make a lot of sense to limit each function to a single return statement, and require it to be at the bottom of the function.

But let’s look at the code above again. The return statements are almost equivalent to setting a return value on a conditional and jumping directly to the end of the method to return that value. Like with a goto statement.

And there isn’t anything wrong with that. Used correctly, goto statements, are an appropriate and even elegant way to control program flow. The Ruby C source code has gotos; tastefully employed gotos are an accepted convention in Unix socket programming. There’s nothing intrinsically wrong with a goto statement.

The upshot is that a leading return may syntactically resemble a guard close, without being a guard clause.

And like any other technique, using it correctly is often as much a matter of taste as anything else.

Errors and exceptions

Suppose we require a guard clause to actually “guard,” what does that look like in Ruby? Given that argument checking should be performed in this method, here’s one way to do it:

#!/usr/bin/env ruby

require 'rspec'

class Guard
  def self.even? number
    return true if number%2 == 0
    false
  rescue NoMethodError => e
    raise e, "bad argument type"
  end
end

describe Guard do
  it "instantiates correctly" do
    expect(Guard.new).to_not be_nil
  end

  it "returns true for an even number" do
    expect(Guard.even?(2)).to eq true
  end

  it "returns false for an odd number" do
    expect(Guard.even?(1)).to eq false
  end

  it "blows up for nil argument" do
    expect { Guard.even?(nil) }.to raise_error(NoMethodError,
                                               /bad arg/)
  end
end

Whether or not argument checking should be performed in the Guard.even? method is not relevant to this article; it’s a different discussion. We’re assuming, a priori, that we do need to check the argument.

Note also def-rescue-end (DRE) instead of begin-rescue-end (BRE). The begin is implied and removes a level of nesting to increase readability.

The design of the Ruby programming language allows ignoring a type check unless the underlying Ruby interpreter complains. This, in turn, allows us to proceed with computation of the common cases up front, and never worry about the error code unless there is an error. In C or C++ (or Java), this would not be possible, the argument checking would have to be performed on entry to the method. In Ruby that would look like this:

def even? number
  raise NoMethodError("bad arg") unless number.is_a? Integer
  return true if number % 2 == 0
  false
end

But now we’re checking the type every time the even? method is invoked. Wasteful and unnecessary. Note that this implementation also passes the specs given above.

Here’s a bit more discussion in some follow-on links.

More discussion on guard clauses (and the if statements they replace)

Any further investigation or discussion should start by reading Martin Fowler’s “Replace nested conditional with guard clause.” This is a very short article and code heavy. An experienced programmer can read it at a glance. Fowler’s definition of a guard clause is includes all methods with multiple returns on leading conditions. This is more expansive than the definition proposed here, which would restrict guard clauses to code which checks argument validity (type, range, etc.). That said, I use Fowler’s definition when discussing code with programmers.

Here’s the blog post I started to write: “Prefer guard clauses over nested conditionals.” I found this after extending the initial draft of this article to include some test code.

This article provides an example of using a guard clause to guard in the sense discussed above: Single exit point. Rereading, I’ve probably drawn a fair bit of material from the “Single exit point” article, hence, it can also be regarded as a reference for this blog post.

Part of the case for guard clauses comes from programmers exasperated with Alpine if statements. You know, those very long functions with deeply nested conditionals. I find such code difficult to read, but I have worked on a fair bit of legacy code with constructions as vastly ghastly as what’s shown in this article. Possibly not quite as “alpine,” but certainly as convoluted.

The readability of guard clauses depends on the reader. In this article, the author attempts to make a case for guard clauses with a coworker. The coworker finds guard clauses increase the difficulty of understanding the control of flow through a program. The author (and myself) find completely the opposite.

Here is a list of controversial rules for code in an open source Java project. About half way down, notice the rule for having a single return located at the last line of the method. Obviously, this rule precludes guard clause techniques in favor of exception handling. Also notice the wordiness of Java compared to Ruby. Whether this wordiness is a good or a bad thing is a certainly matter of taste. Personally, I prefer the conciseness of Ruby (and Python).

What drives the guard clause idiom?

Flattening the arrow is an excellent starting point for understanding conditions promoting a refactor from nested conditionals to guard clauses. You may find the examples slightly dated, but of major interest is Atwood’s Item #4: Always use an opportunity to return as soon as possible from a function. This is almost polar opposite of Dijkstra’s dictum. The notion of “clean code” changes over time, it’s a cultural artifact, machine don’t care.

Last but far from least, the discussion on Guard clause at (the) wiki is worth investigating in detail, as is the article on “The Arrow Anti Pattern.” Wiki at c2 is, of course, the original wiki. If you don’t know, now you know.

As you can see, guard clauses can be hot button topic with programmers adhering to Dijkstra’s Dictum of single exit points. Your opinion, or more likely, your team’s or team leader’s or bosses opinion will decide the day to day use of this idiom, but at least your opinion should be a bit better informed as a result of reading all the way down to here.

Anything you don’t agree with? Leave a comment.

Did I totally blow it somewhere? Please leave a comment.

What did I miss? Yeah, definitely leave a comment!

Indirect creation pattern in Ruby

Factories in Ruby are similar to factories in other programming languages, such as C++. The idea is to have a class (or static) method which produces instances of the desired class. Factories are useful when the specific subclass one wants is unknown at compile time. Hence, using a super class to initialize a subclass depending on parameters allows run time construction of the correct class.

Underpinning the implementation of factories in Ruby is the notion of wrapping the instantiating method in a class method belonging to its parent class. That is, creating an instance requires a level of indirection. You don’t call a method to create an instance directly, you call a method which calls the instantiator.

Examining the notion of “wrapping an instantiating method with a class method” reveals this can be done in any class. Let’s write some running code and test it:

#!/usr/bin/env ruby

require 'rspec'

class Generator
  def self.builder
    return Generator.new
  end

  def write
    "from instance method"
  end
end


describe Generator do
  it "should instantiate itself" do
    g = Generator.builder
    expect(g.write).to eq "from instance method"
  end
end

Recall that the default instantiating method in Ruby is named initialize and invoked with the new method. But initialize is not strictly necessary, as can be seen above.

More about factory pattern in Ruby

Indirect object creation is part and parcel of developing factory classes, but we have barely scratched the surface: factories are a rich vein of interesting programming techniques, and worthy of the time spent mining out practical nuggets. Here are some other articles I found useful:

Perhaps I’ll write more about indirect creation and factories in the future, and why this pattern helps state management in constructors.

Ruby showdown each versus map

Ever have a hard time remember return values for, say, constructions like Ruby Array’s “each” versus “map?”

I know I do.

Writing things down helps me remember those things. In software, writing tests helps me remember behavior. For language features, these tests make up a sort of “supplementary user manual,” a manual which demonstrates how the language feature, class method, library call or what have you actually works, regardless of what the man page or Nutshell book says.

Array and Enumerable Factoids

Here’s what the manuals all say about Array and Enumerable:

  • map is a method in Enumerable.
  • each is a method in Array (each is required by Enumerable). Implementation of map is based on each.
  • each used when side effects are desired, such as printing to a stream, terminal, whatever.
  • map allows for functional programming in Ruby.
  • map returns a new object.
  • each returns the same object.

Let’s write some test code and see what happens. I like to use RSpec for these sorts of chores, but your favorite test API will work just as well.

#!/usr/bin/env ruby

require 'rspec'

describe Array do
  before :each do
    @a = [1,1,2,3,5,8]
  end

  it "should return the same array with each" do
    b = []
    a_ret = @a.each { |e| b << e } # i.e., copy a to b
    expect(@a.object_id).to eq a_ret.object_id
    expect(b).to eq @a
  end

  it "returns a new array with map" do
    a_ret = @a.map { |e| e*e }
    expect(a_ret).to eq [1,1,4,9,25,64]
    expect(@a.object_id).not_to eq a_ret.object_id
    expect(@a).to eq [1,1,2,3,5,8]
  end

  it "modifies the original array with map!" do
    a_ret = @a.map! { |e| e*e }
    expect(a_ret).to eq [1,1,4,9,25,64]
    expect(@a.object_id).to eq a_ret.object_id
  end

  it "returns enumerator if no block given" do
    a_enum = @a.each
    expect(a_enum.class).to eq Enumerator
  end
end

Examining the C source code for Enumerable#map, we see the memory allocation for the new array on line 8:

static VALUE
enum_collect(VALUE obj)
{
    VALUE ary;

    RETURN_SIZED_ENUMERATOR(obj, 0, 0, enum_size);

    ary = rb_ary_new();
    rb_block_call(obj, id_each, 0, 0, collect_i, ary);

    return ary;
}

The important thing here is that the return value of map should be interpreted as the return value from the block execution. That is, the map method doesn’t do any work real work, it simply wraps the call to the underlying function call which evaluates the block. The newly allocated array returning from `map` is passed in the block evaluation function as the last argument, which doubles as that return value. This is a widely accepted practice in C coding.

Test-driven development: structuring workflow

Test driven development (TDD) is perceived as being much more difficult than writing programs without tests. It\’s true that it is more difficult. It\’s not true that it\’s much more difficult.

It is, however, a skill and it must be learned just as any other must be learned.

Having tools helps. Many modern scripting languages such as Python and Ruby have testing tools built into the language or in standard libraries. With other languages, such as c or c++, some effort must be made to set up an appropriate tool chain. This tool chain usually require using either pre-written libraries, or writing the necessary test infrastructure internally.

At least as important as having useful tools is creating the right habits for test driving coding. These test-first habits live in your workflow.

Structuring workflow

Learning how to do anything once is not that hard. Mastering the practice of a discipline requires building effective habits. For example, many people have exercise equipment at home, and know how to use it. But they go to the gym instead, where they find context, motivation and inspiration. The habit is at the gym, not at home.

While motivation and inspiration can be elusive, creating an appropriate context is mechanical. It\’s as easy (and difficult) as creating an appropriate set of habits. Here are a few habits I\’ve found useful for practicing test driven development:

  1. In your editor of choice, always open spec or test files in a known buffer which you can get to with a keystroke or two. I\’m currently using vim, so I always know that the file containing tests is in buffer 1: ESC:b1 gets me there in a flash.
  2. Use a repl like irb or pry to check syntax. This is more helpful than it first appears. It takes the place of all those puts and prints. If you\’re working with a compiled language, write stupid simple standalone programs to check behavior. This can as efficient as cat-ing a program into gcc from the command line.
  3. Always commit a test first, or never less than a test in parallel with new code. Never commit untested code. This is not the same as never pushing untested code.
  4. Scope objects and internal API in the test or spec file and build your functionality in the test file first.
  5. More? You tell me, comments are open!

Following these rules helps me create a workflow structure conducive to TDD.

Vending machines Vegas style

I’m skimming Hoare’s Communicating Sequential Processes, a rather technical book on concurrency.

One of the examples discusses a vending machine providing a food sample first, before payment. The vending machine is used as an illustration for Hoare’s theoretical notation.

But think of that vending machine for a moment.

A vending machine which randomly serves free stuff. For example, on average, 10% is served free.

It could work two ways:

  1. Refund the money, “compliments of the house”, or
  2. Just walk up and push a random button, free food pops out.

I have a hunch the sales volume would go up. Way, way up.

Constructors should set state, nothing more

If a constructor has to do a lot of processing to set its initial state, that’s a code smell.

When a constructor is used for the main processing part of a class, such the code requires an instance of Foo to spend a lot of cycles computing Bar, that’s almost a sure indicator the design needs improvement. It may not need a lot of improvement, but it almost surely needs rethinking.

Constructors should only set state, really. If arguments
must be passed, those should be easy to instantiate themselves.

I’m not the only person who believes doing work in constructors is a flaw.

You can google “doing work in constructors” to find like thinking.

Really bad smell

Check this out:

Foo::Foo(Bar b) : b(b) {

 private_m = Baz::some_wackadoodle_preset_structure;
};

I recently had to find a way to test such a constructor, where the constructor depended on some static (global) value of another class, which was initialized elsewhere.

Very painful.

Design rule of thumb: 1 loop per method

Here’s a great little guideline for writing testable code: limit the number of loops in any function or method to 1 at each level of nesting.

Consider the following function with two “top level” loops:

def function():
    for ():
        ....
        ....
        ....
    for ():
        ....
        ....
        ....

Refactor 1

The body of a loop is often a great candidate for a refactoring. Do this instead:


def helper1():
    for():
        ...
        ...
        ...

def helper2():
    for():
        ...
        ...
        ...

def function():
    helper1()
    helper2()

Instead of sweating one nasty integration test for the function with two loops, you now have two unit tests (test_helper1, test_helper2) and two integration tests (test_function1, test_function2).

Refactor 2

def function1():
    for ():
        ...
        ...
        ...

def function2():
    for ():
        ...
        ...
        ...

function1()
function2()

Depending on what function1 and function2 accomplish, they might be amenable to unit testing as well.

The is pattern for queries

One way to partition functionality within classes is to separate command (mutate) actions from query (inspect) actions.

Query actions have a number of well-established naming conventions. Let’s examine the “is_whatever?” pattern.

Consider a car, a spider and word (Ruby syntax):

car.is_vehicle?       # true
spider.is_insect?     # false, spiders are arachnids
word.is_abbreviation? # 4. WTF!?

The convention is that any function or method prefixed with “is” queries for a boolean value. Returning anything other than true or false will induce cognitive dissonance in the reader.

The reader might well be you, next year.

Naming functions and methods

Function and method naming is not terribly difficult if the following two guidelines are observed:

Query functions return stuff. For example, new, create, get are all functions which ask for something in return.

Command functions change state. For example, set, adjust, compute.

Try very, very hard not to “mix and match”. Functions which both query and command are really hard to test, and worse, they are much harder to understand.

Think of it this way: testing a query function means testing only that the returned result is correct. Testing a command function means testing only the state of the object as a result of the command. If you have a function which commands and queries, you may find your function has weird interactions. Basically, your test code has to test the return, the state, and every possible interaction.

Plus, it’s just semantically confusing otherwise.

Update 2012/05/19: Command and query are analogous to “inspect” and “mutate” in the c++ world.