Wednesday, 12 September 2012

Problems with at_exit{}, exit(), and RSpec

I had an interesting problem the other day, working on a Ruby project of mine. I ran my tests: (note: I have "rak" aliased to "bundle exec rake")
rak test
which internally expands to the equivelent of:
rak test:spec test:int
which runs my specs and integration tests in that order.

Then an odd thing happened. My specs failed but then the integration tests ran anyway and scrolled the spec failure off the screen to report happy success. After some digging around I discovered the following:

  • I'd had this test failure for a few commits without noticing.
  • My CI builds on Travis CI were all reporting success (although the failure message was there in the build logs should one manually check.)
  • Rake itself and RSpec's rake task were both fine.
  • Running my tests directly with the rspec CLI, I was getting an exit code of 0 on both success and failure.

The Problem

RSpec was returning an error/exit/status code of 0 despite test failure. It should be non-zero so that external processes like Travis CI and Rake can determine that something's gone wrong and react accordingly.

I'm going to cut a tedious story down to the result here. After investigation I learned that rspec worked as expected again when I avoided this piece of code in my tests: What this piece of code does is:

  1. create a temporary directory the first time it's called
  2. reuse that temporary directory on subsequent calls
  3. remove the directory at the end of the process's lifecycle

Why does that affect RSpec returning non-zero on failure? Because RSpec itself doesn't run immediately; it wants to wait until all of your specs have been loaded first. The way it accomplishes that is it registers itself to run via an at_exit block and then calls exit when it's finished with your specs. Still sounds like there's no problem right? Well this is what happens from that point on...

  1. RSpec finishes running tests and calls exit with 0 for success or 1 for failure.
  2. Accordingly, Ruby creates an instance of SystemExit and plonks it into the $! global variable.
  3. Now that the process is shutting down, my at_exit in the snippet above starts running.
  4. My cleanup code (correctly, validly and legally) runs Ruby's FileUtils.remove_entry_secure to remove the temporary directly created during tests. This isn't a problem in itself.
  5. Here's the gotcha: FileUtils.remove_entry_secure removes the directory and sets $! to nil to indicate that no exception occurred.
  6. Ruby ends the process and sets the exit code to the result of $!.status which was lost in the previous step.

The Solution

That was the problem. Now what's the solution?
Simple. Just take care to preserve the exit status in your at_exit so that it ends with what it started with. Here is a helper method that I created: (You can also view this file directly on Github in a utility library of mine.)

Then the solution becomes simply use at_exit_preserving_exit_status instead of at_exit in first snippet, and everything works again! Happy days!

TL;DR: Conclusion

Be careful that you don't corrupt the value of $! when using at_exit. If you're not careful (or don't use a handy, safe function like presented above), then you can corrupt the exit status of RSpec in particular, and other libraries that work in a similar fashion.

5 comments:

  1. Out of curiosity, why preserve only SystemExit instead of any exception?

    ReplyDelete
    Replies
    1. Originally I wrote a fn that preserved any kind of status. Then I went and changed it about half an hour later, thinking "ooh good catch! There are scenarios where that would produce an unexpected result". I was slightly proud of myself for realising and anticipating that scenario.... now I can't for the life of me remember what I was thinking. I'll think a bit more over the weekend and comment again if it comes back to me.

      Delete
  2. Hi David,

    I was trying to see in the FileUtils.remove_entry_secure, where it sets $! to nil.

    http://www.ruby-doc.org/stdlib-1.9.3/libdoc/fileutils/rdoc/FileUtils.html#method-c-remove_entry_secure

    I'm unable to set $! to nil myself, so was curious where this code lives.

    $ ruby -e '$! = nil'
    -e:1:in `': $! is a read-only variable (NameError)

    Cheers,
    Phil

    ReplyDelete
  3. Hi mate. Yeah I haven't looked to see where (or how) the code sets $!. If it's not in the Ruby source it's probably in the C source. It's definitely happening somewhere in there though, have a look at this:

        require 'fileutils'
        at_exit{ p $!; FileUtils.remove_entry_secure '/tmp/blaa'; p $! }
        Dir.mkdir '/tmp/blaa'; exit 7

    When run produces this:
        #<SystemExit: exit>
        nil

    ReplyDelete
  4. Because $! is thread-local magic variable, this works and preserves the original exception object:

    module Kernel
    def at_exit_preserving_exit_status
    at_exit { Thread.new { yield }.join }
    end
    end

    ReplyDelete