# Ruby showdown each versus map

Writing things down helps me remember those things. In software, writing tests helps me remember behavior. For language features, these tests make up a sort of “supplementary user manual,” a manual which demonstrates how the language feature, class method, library call or what have you actually works, regardless of what the man page or Nutshell book says.

## Array and Enumerable Factoids

Here’s what the manuals all say about Array and Enumerable:

• map is a method in Enumerable.
• each is a method in Array (each is required by Enumerable). Implementation of map is based on each.
• each used when side effects are desired, such as printing to a stream, terminal, whatever.
• map allows for functional programming in Ruby.
• map returns a new object.
• each returns the same object.

Let’s write some test code and see what happens. I like to use RSpec for these sorts of chores, but your favorite test API will work just as well.

#!/usr/bin/env ruby

require 'rspec'

describe Array do
before :each do
@a = [1,1,2,3,5,8]
end

it "should return the same array with each" do
b = []
a_ret = @a.each { |e| b << e } # i.e., copy a to b
expect(@a.object_id).to eq a_ret.object_id
expect(b).to eq @a
end

it "returns a new array with map" do
a_ret = @a.map { |e| e*e }
expect(a_ret).to eq [1,1,4,9,25,64]
expect(@a.object_id).not_to eq a_ret.object_id
expect(@a).to eq [1,1,2,3,5,8]
end

it "modifies the original array with map!" do
a_ret = @a.map! { |e| e*e }
expect(a_ret).to eq [1,1,4,9,25,64]
expect(@a.object_id).to eq a_ret.object_id
end

it "returns enumerator if no block given" do
a_enum = @a.each
expect(a_enum.class).to eq Enumerator
end
end


Examining the C source code for Enumerable#map, we see the memory allocation for the new array on line 8:

static VALUE
enum_collect(VALUE obj)
{
VALUE ary;

RETURN_SIZED_ENUMERATOR(obj, 0, 0, enum_size);

ary = rb_ary_new();
rb_block_call(obj, id_each, 0, 0, collect_i, ary);

return ary;
}

The important thing here is that the return value of map should be interpreted as the return value from the block execution. That is, the map method doesn’t do any work real work, it simply wraps the call to the underlying function call which evaluates the block. The newly allocated array returning from map is passed in the block evaluation function as the last argument, which doubles as that return value. This is a widely accepted practice in C coding.

# Test-driven development: structuring workflow

Test driven development (TDD) is perceived as being much more difficult than writing programs without tests. It\’s true that it is more difficult. It\’s not true that it\’s much more difficult.

It is, however, a skill and it must be learned just as any other must be learned.

Having tools helps. Many modern scripting languages such as Python and Ruby have testing tools built into the language or in standard libraries. With other languages, such as c or c++, some effort must be made to set up an appropriate tool chain. This tool chain usually require using either pre-written libraries, or writing the necessary test infrastructure internally.

At least as important as having useful tools is creating the right habits for test driving coding. These test-first habits live in your workflow.

## Structuring workflow

Learning how to do anything once is not that hard. Mastering the practice of a discipline requires building effective habits. For example, many people have exercise equipment at home, and know how to use it. But they go to the gym instead, where they find context, motivation and inspiration. The habit is at the gym, not at home.

While motivation and inspiration can be elusive, creating an appropriate context is mechanical. It\’s as easy (and difficult) as creating an appropriate set of habits. Here are a few habits I\’ve found useful for practicing test driven development:

1. In your editor of choice, always open spec or test files in a known buffer which you can get to with a keystroke or two. I\’m currently using vim, so I always know that the file containing tests is in buffer 1: ESC:b1 gets me there in a flash.
2. Use a repl like irb or pry to check syntax. This is more helpful than it first appears. It takes the place of all those puts and prints. If you\’re working with a compiled language, write stupid simple standalone programs to check behavior. This can as efficient as cat-ing a program into gcc from the command line.
3. Always commit a test first, or never less than a test in parallel with new code. Never commit untested code. This is not the same as never pushing untested code.
4. Scope objects and internal API in the test or spec file and build your functionality in the test file first.
5. More? You tell me, comments are open!

Following these rules helps me create a workflow structure conducive to TDD.

# Vending machines Vegas style

I’m skimming Hoare’s Communicating Sequential Processes, a rather technical book on concurrency.

One of the examples discusses a vending machine providing a food sample first, before payment. The vending machine is used as an illustration for Hoare’s theoretical notation.

But think of that vending machine for a moment.

A vending machine which randomly serves free stuff. For example, on average, 10% is served free.

It could work two ways:

1. Refund the money, “compliments of the house”, or
2. Just walk up and push a random button, free food pops out.

I have a hunch the sales volume would go up. Way, way up.

# Constructors should set state, nothing more

If a constructor has to do a lot of processing to set its initial state, that’s a code smell.

When a constructor is used for the main processing part of a class, such the code requires an instance of Foo to spend a lot of cycles computing Bar, that’s almost a sure indicator the design needs improvement. It may not need a lot of improvement, but it almost surely needs rethinking.

Constructors should only set state, really. If arguments
must be passed, those should be easy to instantiate themselves.

I’m not the only person who believes doing work in constructors is a flaw.

You can google “doing work in constructors” to find like thinking.

Check this out:

Foo::Foo(Bar b) : b(b) {

};


I recently had to find a way to test such a constructor, where the constructor depended on some static (global) value of another class, which was initialized elsewhere.

Very painful.

# Design rule of thumb: 1 loop per method

Here’s a great little guideline for writing testable code: limit the number of loops in any function or method to 1 at each level of nesting.

Consider the following function with two “top level” loops:

def function():
for ():
....
....
....
for ():
....
....
....


## Refactor 1

The body of a loop is often a great candidate for a refactoring. Do this instead:


def helper1():
for():
...
...
...

def helper2():
for():
...
...
...

def function():
helper1()
helper2()


Instead of sweating one nasty integration test for the function with two loops, you now have two unit tests (test_helper1, test_helper2) and two integration tests (test_function1, test_function2).

## Refactor 2

def function1():
for ():
...
...
...

def function2():
for ():
...
...
...

function1()
function2()


Depending on what function1 and function2 accomplish, they might be amenable to unit testing as well.

# The is pattern for queries

One way to partition functionality within classes is to separate command (mutate) actions from query (inspect) actions.

Query actions have a number of well-established naming conventions. Let’s examine the “is_whatever?” pattern.

Consider a car, a spider and word (Ruby syntax):

car.is_vehicle?       # true
spider.is_insect?     # false, spiders are arachnids
word.is_abbreviation? # 4. WTF!?


The convention is that any function or method prefixed with “is” queries for a boolean value. Returning anything other than true or false will induce cognitive dissonance in the reader.

The reader might well be you, next year.

# Naming functions and methods

Function and method naming is not terribly difficult if the following two guidelines are observed:

Query functions return stuff. For example, new, create, get are all functions which ask for something in return.

Command functions change state. For example, set, adjust, compute.

Try very, very hard not to “mix and match”. Functions which both query and command are really hard to test, and worse, they are much harder to understand.

Think of it this way: testing a query function means testing only that the returned result is correct. Testing a command function means testing only the state of the object as a result of the command. If you have a function which commands and queries, you may find your function has weird interactions. Basically, your test code has to test the return, the state, and every possible interaction.

Plus, it’s just semantically confusing otherwise.

Update 2012/05/19: Command and query are analogous to “inspect” and “mutate” in the c++ world.

# Good reason to learn Java, C/C++

C, C++ and Java may seem simply annoyances to programmers weened on scripting languages. But learning one or more of these languages is worthwhile, for many reasons.

One reason is this: Vast amounts of software engineering literature provide examples using these languages.

Currently, I’m reading “Working Effectively with Legacy Code” by Michael C. Feathers.

All the code in the book is (of course) C, C++ or Java. But the principles documented in the book are language independent.

The last example in the book, on extraction refactoring, is something I’ve been doing for years, and didn’t know there was a name for it. That’s pretty cool.

# Learning to test-first

If you didn’t learn it that way, test-first isn’t the easiest way to program.

But it can be learned.

1. Write the code you’re going to write anyway.
2. Test your just-written code thoroughly.

It takes extra time, but only the first time.

Given you’re writing test code anyway, test-first will pay you back the second or third time you write the code.

If you’re not writing test code, move along, nothing to see here.

# Testing: clients will pay now or pay later

A former freelancer sends a message to a programming email list, explaining his full time job doesn’t give him enough time to update his former client’s website. And it’s on an obsolete and soon-to-be-unsupported version of the web application framework (which framework is irrelevant).

The main gotcha: he didn’t write any test code.

Most likely, the client was unable or unwilling to pay for the test code when the site was first created.

Most likely, the client will not be interested in paying for test code now.

The irony here is that had the developer originally written test code, he would have much more confidence updating the site.

The client pays regardless: either someone comes and cleans it up, or the client pays for a new web site. At this point, not writing test code increases developer time as the new developer reverse engineers the existing code. Might as well test.