John Firebaugh

Open Source, Ruby, Rubinius, RubySpec, Rails.

New-less JavaScript

JavaScript borrowed the new and delete keywords from its less-dynamic predecessor languages. They feel a bit out of place in a garbage collected language and are a source of confusion for newbies – one of the reasons popular libraries such as jQuery, d3, and Ember.js have adopted APIs that don’t require using new at all. In this post I’ll show you one way do it, and why you should consider it for your next JavaScript library.

In C++, new and delete are symmetric operators that combine memory management and object lifecycle operations. new allocates an instance and calls the constructor, and delete calls the destructor and deallocates it. In garbage collected languages, delete isn’t necessary; Java doesn’t have it, for example. It’s one of JavaScript’s quirks that it does use delete, and for a purpose (removing a property from an object) that’s not symmetric with new. You’ll sometimes see confused JavaScript programmers trying to delete plain objects.

new can itself be a source of confusion. Looking at instantiation with new, it’s easy to see why: new User() requires both a language keyword and a unique syntactic form. In other scripting languages, neither are needed: instantiation is done via regular function notation, either via a class method as in Ruby (User.new), or calling the class object as a function as in Python (User()).

What’s worse, in JavaScript, forgetting new when calling a constructor can produce some very strange behavior – leaving variables undefined and polluting the global scope. John Resig gave a great rundown of the issues and proposed a solution, often dubbed the ‘instanceof trick’:

The instanceof trick
1
2
3
4
5
6
7
8
9
function User(first, last) {
  if (!(this instanceof User)) return new User(first, last);
  this.first = first;
  this.last = last;
}

User.prototype.fullName = function() {
  return this.first + ' ' + this.last;
}

Class constructors defined with the instanceof trick can be called with or without new:

1
2
3
4
5
  var userViaNew = new User("John", "Firebaugh");
  userViaNew.fullName(); // "John Firebaugh"

  var userDirect = User("John", "Resig");
  userDirect.fullName(); // "John Resig"

In either case, the result is a newly allocated and initialized User.

John Resig goes on to show how to create a reusable function that builds constructors that use the instanceof trick. This way, instead of writing it out manually for every class, you write it once and use a higher-level function to declare classes and their prototype properties:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// The `Class.extend` function defines a new constructor that uses
// the instanceof trick internally, and then calls `initialize`:
var User = Class.extend({
  initialize: function(first, last) {
    this.first = first;
    this.last = last;
  },

  fullName: function() {
    return this.first + ' ' + this.last;
  }
});

var userViaNew = new User("John", "Firebaugh");
var userDirect = User("John", "Resig");

APIs that use the instanceof trick internally often publicly document only the newless form of instantiation. This minimizes the API surface area, gives new users less to learn, and avoids the awkward aspects of new. Another advantage is that it allows the implementation to choose either the Module pattern or classic prototypal instantiation without changing the public API. For example, we can rewrite the User class to use the Module pattern without changing how it’s used:

1
2
3
4
5
6
7
8
9
10
function User(first, last) {
  function fullName() {
    return first + ' ' + last;
  }
  return {
    first: first,
    last: last,
    fullName: fullName
  };
}

Similarly, if we had started out with the Module pattern, but discovered that we needed the memory efficiency of prototypal instantation, we could switch without changing client code.

An alternative that some library use is to pair each constructor with a factory function of the same name but with leading lower-case, e.g. User and user:

1
2
3
4
5
6
  function user(first, last) {
    return new User(first, last);
  }

  var userViaNew = new User("John", "Firebaugh");
  var userViaFactory = user("John", "Resig");

This works, but again, it increases the size of the API you need to document and users need to learn, and unlike the instanceof trick, there’s no way to implement it in a reusable form. You’re stuck writing a factory wrapper for every class.

A newless API works great for as jQuery and d3, and I’ve found it very useful in iD as well. Consider using it for your next library.

How to Securely Bootstrap JSON in a Rails View

A common pattern with client-side MVC applications is to embed the data for a base set of models in the initial page instead of making a separate AJAX request to load them. In a Rails application, this is typically done by interpolating the result of a call to to_json in the view. The Backbone.js docs provide this example:

1
2
3
4
5
6
<script>
  var Accounts = new Backbone.Collection;
  Accounts.reset(<%= @accounts.to_json %>);
  var Projects = new Backbone.Collection;
  Projects.reset(<%= @projects.to_json(:collaborators => true) %>);
</script>

If you try this in a Rails 3 application, you will discover that by default, the interpolated results of to_json are HTML-escaped: &, >, <, and " are replaced with the equivalent HTML entities. Inside the script tag, this is almost certainly not what you want. JSON strings containing &, >, and < should contain those characters literally, and the " character delimits the JSON strings themselves. Escaping them prevents the desired result: a literal JavaScript value embedded in the script.

The common reaction is to disable HTML escaping, either by prepending the call to to_json with the raw helper, or calling html_safe on the result. Here’s the same example using each of these techniques:

DO NOT FOLLOW THIS EXAMPLE
1
2
3
4
5
6
<script>
  var Accounts = new Backbone.Collection;
  Accounts.reset(<%= raw @accounts.to_json %>);
  var Projects = new Backbone.Collection;
  Projects.reset(<%= @projects.to_json(:collaborators => true).html_safe %>);
</script>

Do not follow this example! Used in this way, both raw and html_safe open vectors for a cross-site scripting vulnerability, and it is unfortunate that their use is so widespread and commonly recommended.

To understand the vulnerability, consider what happens if one of the strings in the JSON contains the text </script>. This text is interpolated into the page, and since both raw and html_safe disable HTML-escaping, it is interpolated literally. As a consequence, and despite the fact that it appears within a JavaScript string literal, </script> closes the script element, leaving an opportunity to embed an XSS payload in the subsequent text:

1
2
3
4
5
<script>
  var Accounts = new Backbone.Collection;
  Accounts.reset([{name: "</script><script>alert('xss')</script>", ...}]);
  // ...
</script>

The simplest way to escape JSON strings that may contain the </ sequence is to precede the slash with a backslash. Though simple to do, this should be built in to Rails. Unfortunately, it is not. The obvious candidate would be json_escape, aliased as j, which one would expect to be the JSON analog of the old Rails 2 h helper for HTML escaping:

1
2
3
4
5
<script>
  var Accounts = new Backbone.Collection;
  Accounts.reset(<%=j @accounts.to_json %>);
  // ...
</script>

However, in addition to escaping the JSON in a way that prevents XSS, json_escape also removes double quote (") characters. Yes, that’s right, json_escape is documented to return invalid JSON. This baffling behavior is most likely a mistake in the original implementation. I’ve submitted a pull request to change it, which will hopefully be accepted for Rails 4.

A second attempt might be to try escape_javascript, but this escapes much more than necessary. It could probably be made to work, but would require parsing JSON on the client rather than simply interpolating a literal JavaScript value.

Finally, there’s the option of setting ActiveSupport::JSON::Encoding.escape_html_entities_in_json to true. This works, but since the default was explicitly changed to false in Rails 3, it feels like a workaround at best. If you change the default globally, be sure that any consumers of JSON APIs provided by your application are prepared to handle Unicode escape sequences, because it will result in </script> being escaped as \u003C/script\u003E rather than <\/script>.

My recommendation is to overwrite json_escape with a sensible definition and use that:

config/initializers/json_escape.rb
1
2
3
4
5
6
7
8
class ActionView::Base
  def json_escape(s)
    result = s.to_s.gsub('/', '\/')
    s.html_safe? ? result.html_safe : result
  end

  alias j json_escape
end
view.html.erb
1
2
3
4
5
6
<script>
  var Accounts = new Backbone.Collection;
  Accounts.reset(<%=j @accounts.to_json.html_safe %>);
  var Projects = new Backbone.Collection;
  Projects.reset(<%=j @projects.to_json(:collaborators => true).html_safe %>);
</script>

This is simple, sufficient to prevent XSS from bootstrapped JSON, and will hopefully be the built-in behavior of json_escape/j in Rails 4.

Why Ember.js Doesn’t Use Property Descriptors

Like model classes in many other JavaScript MVC frameworks, Ember.Object uses get()/set()-based property accessor functions rather than native JavaScript properties:

1
2
3
4
5
MyApp.president = Ember.Object.create({
  name: "Barack Obama"
});

MyApp.president.get('name'); // Not `MyApp.president.name`

Of course, Ember provides computed properties and property bindings, features that plain JavaScript properties don’t support. Fortunately, ECMAScript 5 includes property descriptors, and in particular, the Object.defineProperty() method. Object.defineProperty allows you to specify a function to serve as a getter for the property (for example, to implement a computed property) and a corresponding function to serve as a setter (which can be used to implement properties that notify their observers when they change). So why doesn’t Ember use property descriptors to provide more natural, less intrusive object properties?

Browser compatibility is only part of the answer. Browser support for property descriptors is actually reasonably decent: present on Firefox, Chrome, Safari, and Opera (naturally), and IE >= 9. For applications that can afford to drop IE 8 support, particularly mobile apps, property descriptor-based model objects would work great.

The other part of the answer is that Ember provides a feature that goes beyond what property descriptors can support: unknown properties. This is a feature akin to Ruby’s method_missing, where you can define a handler that’s called upon access to a property you haven’t explicitly defined:

1
2
3
4
5
6
7
MyApp.president = Ember.Object.create({
  unknownProperty: function (key) {
    if (key === "name") return "Barack Obama";
  }
});

MyApp.president.get('name'); //=> "Barack Obama"

Property descriptors do not allow you to define a function that’s called when a previously undefined property is accessed or assigned. Accesses of undefined properties simply produce undefined, and without a previously installed setter function the object has no way of detecting assignment to a previously undefined property, which it needs to be able to do in order to notify observers of the change. So Ember needs something more than property descriptors in order to support unknown properties.

Good news: that “something more” is on the horizon, in the form of object proxies, a language feature slated for ECMAScript Harmony, the next release of the JavaScript standard. Object proxies allow you to create a virtualized object that wraps a given target object. The key feature for Ember is that get and set are part of the proxy API, permitting the wrapper to intercept and handle all property accesses and assignments, even for unknown properties.

Bad news: it will be awhile before support for object proxies is widespread. Currently, support is limited to Firefox and Chrome (after enabling “Experimental JavaScript” in chrome://flags), both of which actually support an older, slightly different proxy specification.

Thanks to Tom Dale for answering this question for me. Any inaccuracies in the above are my own.

Code Archaeology With Git

Have you ever dug through the commit history of an open source project, peeling away layers, sifting for clues, trying to answer the question, “why does this code do what it does”? I call this process code archaeology.

Code archaeology is made difficult by historical debris: reformatting, refactoring, code movement, and other incidental changes. This post takes a look at techniques for separating the interesting commits from the uninteresting ones. We’ll look at existing git tools, a tool provided by another SCM system that I wish had a git equivalent, and a promising feature of git that has yet to arrive.

Blame

git blame or github’s blame view is frequently the first step—but also the first source of frustration:

git blame has a few options that can help with this problem.

  • With -w, blame ignores lines where only whitespace changed.
  • With -M, blame detects moved or copied lines within a file, and blames them on the original commit instead of the commit that moved or copied them.
  • With -C, blame extends this move or copy detection to other files that were modified in the same commit. You can specify this option two or three times to make git look even harder (but more slowly) for moves and copies. See the manpage for details.

For example, I compared the blame for Rails’s sprockets railtie without any options and with -wCCC. The latter was able to tell that this commit shouldn’t be blamed because it changed only whitespace, and it blamed the multiline comment near the end of the file on the commit where it was originally introduced, rather than a later commit which moved it.

If any githubbers are reading this: how about supporting some power-user query parameters on blame pages? I suggest w=1 for ignoring whitespace (a parameter which is already supported on diff pages); M=1, C=1, C=2, and C=3 for various levels of move and copy detection.

Pickaxe

If you read the git blame manpage, you might have noticed a somewhat cryptic reference to the “pickaxe” interface. Pickaxes are often useful for archaeological purposes, and git’s pickaxe is no exception. It refers to the -S option to git log. The -S option takes a string parameter and searches the commit history for commits that introduce or remove that string. That’s not quite the same thing as searching for commits whose diff contains the string—the change must actually add or delete that string, not simply include a line on which it appears.

For example, I was looking at the same Sprockets railtie I looked at with blame and trying to figure out why Rails.env was included in Sprocket’s environment version on line 24. Blame landed on an uninteresting commit:

1
2
3
4
5
6
7
8
9
10
11
12
$ git blame -L23,25 actionpack/lib/sprockets/railtie.rb
8248052e (Joshua Peek 2011-07-27 15:09:42 -0500 23)       app.assets = Sprockets::Environment.new(app.root.to_s) do |env|
63d3809e (Joshua Peek 2011-08-21 16:42:06 -0500 24)         env.version = ::Rails.env + "-#{config.assets.version}"
8248052e (Joshua Peek 2011-07-27 15:09:42 -0500 25)
$ git log -1 63d3809e
commit 63d3809e31cc9c0ed3b2e30617310407ae614fd4
Author: Joshua Peek <[email protected]>
Date:   Sun Aug 21 16:42:06 2011 -0500

    Fix sprockets warnings

    Fixes #2598

But the pickaxe found the answer right away:

1
2
3
4
5
6
7
8
9
10
$ git log -S'Rails.env' actionpack/lib/sprockets/railtie.rb
commit ed5c6d254c9ef5d44a11159561fddde7a3033874
Author: Ilya Grigorik <[email protected]>
Date:   Thu Aug 4 23:48:40 2011 -0400

    generate environment dependent asset digests

    If two different environments are configured to use the pipeline, but
    one has an extra step (such as compression) then without taking the
    environment into account you may end up serving wrong assets

git gui blame

git gui blame might be the most useful and least known features of the Tcl/Tk-based GUI included with git. You give it the name of a file and it opens an interactive blame viewer with built-in move and copy detection and an easy way to reblame from a parent commit. Check it out:

Screenshot of git gui blame

The first column on the left shows the blame with move and rename detection, and the second shows who moved the line to its current location. In the lines selected in green, we see evidence of the movement of the same comment that we looked at with command-line blame: in the first column, José Valim (JV) originated it in 8f75, and Josh Peek (JP) later moved it in 8428.

The killer feature of git gui blame is found in the context menu: “Blame Parent Commit”. When blame lands on an uninteresting commit, you can use this command to skip over it and reblame from the immediately prior commit. This is so useful that gui blame has become my go-to tool for code archeology.

Perforce Time-lapse View

I would never choose to use Perforce over git, but I do miss one feature that it provides: the time-lapse view.

The time-lapse view is great for quickly scrubbing through the history of a file, but it’s difficult to keep a particular line of interest in view as you scrub. And because it showed only a linear history, it suffers from Perforce’s branching model; I would frequently land on a huge “integration changelist” (Perforce’s equivalent of a merge commit) and need to go look at the time-lapse on a different branch.

Still, I was often able to unearth interesting commits more quickly than I can with git blame, and I still hope somebody creates a similar tool for git.

Git Line-level History Browser

The 2010 Google Summer of Code included a project for git called the Line-level History Browser, a set of feature additions for the git log command to make it easy to track the history of a line (or set of lines), even through file renames and code movement.

Thomas Rast, co-mentor for the project, explains the purpose of the feature:

For me it replaces a manual iterative process to find out in what ways a function was patched until it came to have its current shape:

git-blame the area, find the most recent commit C
while 1:
  git show C
  if that explains the code: break
  git-blame the area in C^
  find the most recent commit and call it C again

I do this a lot when a particular section of code puzzles me or seems buggy, to see if any commit message provides a reason for it. I think (but I never got good with it) the “blame parent” feature of git gui blame covers a similar use-case.

All of this can now be replaced by a simple git log -L <range> <filename>

And Bo Yang, the mentee, lists details:

Generally, the goal of this project is to:

  1. git log -L to trace multiple ranges from multiple files;
  2. move/copy detect when we reach the end of some lines(where lines are added from scratch).

And now, we have supports in detail:

  1. git log -L can trace multiple ranges from multiple files;
  2. we support the same syntax with git blame -L options;
  3. we integrate the git log -L with --graph options with parent-rewriting to make the history looks better and clear;
  4. move/copy detect is in its half way. We get a nearly workable version of it, and now it is in a phrase of refactor, so in the scope of GSoC, move/copy detect only partly complete.

Eventually, the feature was to support “fuzzy” matching of moves and copies, so that the history could be traced across even more “refactoring”-type commits. Sounds fantastic, right? Why am I not using this every day? Unfortunately, Bo’s work didn’t get merged to git master. It’s not completely defunct; Thomas Rast maintains a WIP version which has seen some recent activity, so I’m cautiously optimistic this feature may yet be released.

Xcode 4.3, Homebrew, and Ruby

Ruby on Mac OS Lion is going through a bit of a rough patch, installation-wise. With Xcode 4.2, clang became the default compiler and gcc was no longer included. Unfortunately, this has caused a lot of grief for Rubyists on OS X, because for a while, MRI did not officially support compiling with clang. With the release of 1.9.3-p125, that situation has changed–clang is now officially supported–but there are still some gotchas. This post details my toolchain and process for running MRI 1.9.3 and 1.8.7 on Lion with Xcode 4.3.

If you want a TL;DR: install the Xcode 4.3 command line tools. Then,

1
2
3
4
5
6
$ brew update
$ brew install autoconf automake
$ brew install https://raw.github.com/Homebrew/homebrew-dupes/master/apple-gcc42.rb
$ rvm get head
$ rvm install 1.8.7
$ rvm install 1.9.3-head

Read on for a detailed rationale.

Xcode

I use Xcode 4.3 and have installed the Xcode command line tools. I’ve uninstalled all previous versions of Xcode. If you don’t use Xcode itself, save yourself a multi-gigabyte download and install just the command line tools, which are now available separately. Thanks to Kenneth Reitz for his work making this happen.

Homebrew

Homebrew now has good support for Xcode 4.3. Just make sure to brew update.

In order to build MRI, you’ll need to install some specific formulas. First of all, autoconf and automake:

$ brew install autoconf automake

You need these because Xcode 4.3 no longer includes autotools; if you have installed Xcode 4.3 and uninstalled the previous versions, you will no longer have /usr/bin/autoconf. You don’t usually need the autotools for installing homebrew formulas, since the downloaded packages should come with configure pregenerated, but you do need them in order to install head versions of MRI as described below.

Second, install gcc–the real version–from homebrew-dupes:

$ brew install https://raw.github.com/Homebrew/homebrew-dupes/master/apple-gcc42.rb

The command line tools provide /usr/bin/gcc, but it’s a modified version based on LLVM and if you try to use it to compile 1.8.7, you’ll get the following crash when trying to install gems:

1
2
3
$ gem install bundler
/Users/john/.rvm/rubies/ruby-1.8.7-p358/lib/ruby/1.8/timeout.rb:60: [BUG] Segmentation fault
ruby 1.8.7 (2012-02-08 patchlevel 358) [i686-darwin11.3.0]

Kenneth Reitz’s osx-gcc-installer is another popular way of getting GCC, but you don’t want to install it on top of the Xcode 4.3 command line tools, because it will overwrite working versions of llvm-gcc and clang. Homebrew-alt’s apple-gcc42 formula gives you just Apple’s GCC 4.2, installed at /usr/local/bin/gcc-4.2.

RVM

Install RVM and run rvm get head. The latest RVM has the smarts to use the correct compilers to build both 1.9.3 and 1.8.7 – clang for 1.9.3 and gcc-4.2 for 1.8.7. I tend to install both so I can test my gems on both versions.

rvm install 1.9.3
rvm install 1.8.7

You shouldn’t see any errors or warnings from these commands, and you shouldn’t need to specify --with-gcc=clang or --with-gcc=gcc-4.2. If you see something like Building 'ruby-1.8.7-p358' using clang - but it's not (fully) supported, expect errors, you’ve done something wrong. Go back and make sure your command line tools are correctly installed and you’ve installed the apple-gcc42 homebrew-alt formula.

You should now have working copies of 1.9.3 and 1.8.7. Hooray!

Still, you might run into one more problem. If you try to debug on 1.9.3, you’ll get this error:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
/Users/john/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:
  in `require': dlopen(/Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug-base19-0.11.25/lib/ruby_debug.bundle, 9):
  Symbol not found: _ruby_current_thread (LoadError)
  Referenced from: /Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug-base19-0.11.25/lib/ruby_debug.bundle
  Expected in: flat namespace
 in /Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug-base19-0.11.25/lib/ruby_debug.bundle - /Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug-base19-0.11.25/lib/ruby_debug.bundle
  from /Users/john/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
  from /Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug-base19-0.11.25/lib/ruby-debug-base.rb:1:in `<top (required)>'
  from /Users/john/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
  from /Users/john/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/site_ruby/1.9.1/rubygems/custom_require.rb:36:in `require'
  from /Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug19-0.11.6/cli/ruby-debug.rb:5:in `<top (required)>'
  from /Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug19-0.11.6/bin/rdebug:108:in `require_relative'
  from /Users/john/.rvm/gems/ruby-1.9.3-p125/gems/ruby-debug19-0.11.6/bin/rdebug:108:in `<top (required)>'
  from /Users/john/.rvm/gems/ruby-1.9.3-p125/bin/rdebug:19:in `load'
  from /Users/john/.rvm/gems/ruby-1.9.3-p125/bin/rdebug:19:in `<main>'

This is caused by a clang incompatibility that didn’t get fixed until after the 1.9.3-p125 release. Use the head version of 1.9.3 instead: rvm install 1.9.3-head.

Phew! Now you’re really bleeding edge.

Visualizing SFpark Demand-Responsive Meter Rate Adjustments

On July 11th 2011, SFpark announced the first set of meter rate adjustments. Meter operational hours were divided into six distinct rate periods, and the hourly price of metered parking in the project’s seven pilot areas was adjusted on a block-to-block basis in response to parking demand during each period:

  • +25¢ in periods of 80% or more occupancy
  • No change in periods of 60-80% occupancy
  • −25¢ in periods of 30-60% occupancy
  • −50¢ in periods of less than 40% occupancy

I created this visualization using the d3.js framework and data provided by SFpark. Click for full size.

SFpark meter
rate adjustment visualization

Kernel Density Estimation With Protovis

A kernel density estimate provides a means of estimating and visualizing the probability distribution function of a random variable based on a random sample. In contrast to a histogram, a kernel density estimate provides a smooth estimate, via the effect of a smoothing parameter called the bandwidth, here denoted by h. With the correct choice of bandwidth, important features of the distribution can be seen; an incorrect choice will result in undersmoothing or oversmoothing and obscure those features.

Here we see a histogram and three kernel density estimates for a sample of waiting times in minutes between eruptions of Old Faithful Geyser in Yellowstone National Park, taken from R’s faithful dataset. The data follow a bimodal distribution; short eruptions are followed by a wait time averaging about 55 minutes, and long eruptions by a wait time averaging about 80 minutes. In recent years, wait times have been increasing, possibly due to the effects of earthquakes on the geyser’s geohydrology.

Making Sense of Constant Lookup in Ruby

In Ruby 1.8, constant lookup is mostly lexically scoped, even when using class_eval or instance_eval. That is, constant lookup always searches the chain of lexically enclosing classes or modules. The first lexically enclosing class or module is not necessarily the same as self.class:

Here’s the output on 1.8.7:

Here we can see that within the lexical block defining Foo#foo, which is enclosed by class Foo, X refers to Foo::X, while in the lexical blocks used for the singleton method, class_eval, and instance_eval, which are not in class Foo’s scope, X refers to ::X, the global constant.

However, in 1.9, the situation changes, and moreover, the behavior is different between 1.9.1 and 1.9.2. Here’s the result of running rvm 1.9.1,1.9.2 constants.rb:

So, in 1.9.1, constant lookup in class_eval and instance_eval proceeds from the receiver, rather than the lexically enclosing scope. Particularly for class_eval, this turned out to be a problematic change, breaking existing libraries that depended on the 1.8.7 behavior and making it hard to build DSLs that behaved in predictable ways. Eventually, it was decided to revert to the 1.8 behavior, and this was supposedly implemented:

> [Matz] would like to revert all of instance_eval, instance_exec,
> class_eval, and class_exec to the behavior of 1.8 (including class
> variables). [...]

I have just commited it to the SVN trunk.

I say “supposedly” only because as you can see, the 1.9.2 behavior still differs from 1.8.7 in the case of instance_eval. Was this an oversight, or was the revert later unreverted for instance_eval? If so, what was the rationale? I searched the mailing list and subsequent commits, but couldn’t find an explanation. If anyone can shed light on the matter, I would appreciate it.

As you can see, 1.9.2 also changed the behavior for singleton methods: the receiver’s scope is now searched before the lexically enclosing scope. This change makes sense to me, and I haven’t heard of it causing any problems.

Note that these rules apply to constant definition as well as lookup. In 1.8 and 1.9.2, a constant defined in a class_evaluated block will be defined in the enclosing lexical scope, rather than the scope of the receiver. This is one of the things that makes Foo = Class.new { … } not quite the same as class Foo; …; end:

The block passed to Class.new is effectively class_evaluated, so in this example, the constant Quux ends up defined at the top level. (And once again 1.9.1 is the exception: it defines Baz::Quux instead.) This behavior can cause problems if you are in the habit of defining test classes in RSpec describe blocks:

Here TestClass winds up in the global scope, not the scope of the RSpec example group that describe creates. If you have multiple specs that define test classes with the same name, you may get collisions unless you place each describe within a uniquely-named module or diligently remove the constants in an after block. In the above example, you’ll get the error “superclass mismatch for class TestClass”.

If you need to ensure a particular scoping is used (for example, if you need to support 1.9.1 as well as 1.8.7/1.9.2), you can always be explicit about it by prefixing constants with :: (for global lookup), self:: (for receiver scope), or the fully qualified desired scope:

Selenium on Ruby

The state of Selenium on Ruby is a bit confusing. Among the top google results for “selenium ruby” are several links that are badly out of date, and it’s not clear which of the many gems are ones you would want to use. This post aims to clear up some of the confusion.

First of all, you should be aware that there are two Selenium APIs: the original 1.0 API, called Selenium-RC, and a newer 2.0 API called Selenium WebDriver. Internally, the two have quite different architectures. In a nutshell, Selenium-RC is based around a Java “Remote Control” server process that launches the browser under test and manages communication between the client process (the Ruby interpreter, in our case) and the browser. The browser is controlled by injecting the “Selenium Core” JavaScript framework into the Browser’s built-in JavaScript interpreter. In contrast, WebDriver requires no external process; the browser is launched directly, and controlled via means that vary from browser to browser. For example, WebDriver controls Firefox via a custom Firefox extension.

Ruby has good support for both RC and WebDriver. The selenium-client gem (docs, source) provides bindings for the RC API, and the selenium-webdriver gem (wiki, docs, source) provides bindings for the WebDriver API. You will likely want to use one of these two gems, but which one? If you are using Selenium via a higher-level Ruby library such as Webrat or Capybara, the choice will be made for you: Webrat uses selenium-client, Capybara uses selenium-webdriver. If you want to access a Selenium API directly, I would generally recommend selenium-webdriver. The WebDriver API provides several advantages, including multi-browser testing, page navigation, drag-and-drop, and multi-frame and multi-window support. It recently released its first beta and is where the future of Selenium and Selenium on Ruby lies. It is, however, still beta software and sometimes changes its API in aggravating ways. If stability is of paramount concern, stick with selenium-client.

You may find references to some other Selenium-related Ruby projects. The Selenium gem (with a capital “S”) and its associated website selenium.rubyforge.org paved the way for Selenium on Ruby, but today it is obsolete, as is the selenium-rails gem which depends on it. Unfortunately they are still prominently featured in search results and on the outdated “Selenium on Ruby” page, which doesn’t even mention selenium-webdriver.

The Selenium RC API relies on an external Java-based server process, and though the selenium-client gem provides rake tasks to start and stop an RC server, it does not provide the actual jar file necessary to run the service. You can either download and install it yourself, or install the selenium-rc gem, which bundles it. You’ll sometimes see a gem that depends on selenium-client also depending on selenium-rc solely for the jar, as the jasmine gem does. The selenium-rc gem has some Ruby code in it too, but it more or less duplicates functionality that’s already part of selenium-client.

Finally there’s the selenium and selenium_remote_control gems, which provide functionality similar to selenium-client and selenium-rc. They don’t seem widely used and at first glance I don’t see any reason to prefer them to the more popular gems. Recent releases of the selenium-webdriver gem include the selenium-client code as well, and personally, I hope that selenium-webdriver can usurp the “selenium” gem name and become the One True Selenium gem for Ruby.