Christoph Schiessl's Blog

Garbage Collectable Symbols Finally Arrive in MRI Ruby 2.2.0

MRI Ruby 2.2.0 was released a couple of days ago. I want to use this opportunity to talk about one of the new features shipping with 2.2.0 and explain why it matters: Garbage collectable Symbols.


There was no garbage collection for Symbols before 2.2.0, meaning that all Symbol objects you created were permanently kept in memory. Inevitably, this lead to problems, because the programmer had to be aware of memory management when working with Symbols in order to avoid leaks.

The Ruby community has discussed this issue for a long a time, but nobody was able to deliver a practical solution up until now. With the release of Ruby 2.2.0, the problem has finally been resolved. To understand how the new garbage collection mechanism for Symbols works, we have to distinguish between hard-coded and dynamically created Symbols.

Now, the hard-coded variety, never posed any problems. The reason for that is of course, that all Ruby programs contain a manageable number of hard-coded Symbols. In fact, hard-coded Symbols are still not garbage collectable in 2.2.0, which is easy to demonstrate:

1
2
3
4
5
6
7
8
9
10
11
12
13
~  irb
2.2.0 :001 > GC.start
 => nil
2.2.0 :002 > Symbol.all_symbols.size
 => 3312
2.2.0 :003 > :foobar
 => :foobar
2.2.0 :004 > Symbol.all_symbols.size
 => 3313
2.2.0 :005 > GC.start
 => nil
2.2.0 :006 > Symbol.all_symbols.size
 => 3313

The evaluation of line number six, caused the total number of Symbols (measured with Symbol.all_symbols) to increase by 1. However, it didn’t decrease after garbage collection. Therefore, we can conclude that the garbage collector doesn’t reclaim hard-coded Symbols.

On the other hand, there are dynamically created Symbols. That’s the variety, we have to worry about. Most commonly, these Symbols are created by (directly or indirectly) converting Strings with String#to_sym. For illustration, we can perform the same experiment again, but create the new Symbol with String#to_sym instead.

1
2
3
4
5
6
7
8
9
10
11
12
13
~  irb
2.2.0 :001 > GC.start
 => nil
2.2.0 :002 > Symbol.all_symbols.size
 => 3312
2.2.0 :003 > "foobar".to_sym
 => :foobar
2.2.0 :004 > Symbol.all_symbols.size
 => 3313
2.2.0 :005 > GC.start
 => nil
2.2.0 :006 > Symbol.all_symbols.size
 => 3312

As you can see, our dynamically created Symbol has been garbage collected: The total number of Symbols (measured in line number 12) decreased by 1. This wouldn’t have happened in Ruby 2.1.5.

Potential ‘Denial of Service’ attack vector

All my previous examples are more or less benign, because there’s only a single Symbol object involved. Things start to get more interesting if your program creates Symbols based on user-supplied data in long-running processes.

One of the places where you may encounter this in the wild, is the handling of HTTP parameters. For instance, the following Rack application splits the request’s query string and calls #to_sym on each of the resulting substrings.

1
2
3
4
5
run Proc.new { |env|
  params = env['QUERY_STRING'].split(',')
  params.each(&:to_sym) # symbolize params...
  ['200', {}, []]
}

For applications like this one, the improvements in 2.2.0 are supposed to make a big difference. The Rack application above is perfectly fine in Ruby 2.2.0, but it leaks memory in 2.1.5. As established already, Symbols created with #to_sym are garbage collectable in 2.2.0, but not in 2.1.5.

We can start the Rack application and monitor its memory consumption with ps, while flooding it with HTTP requests at the same time. If we do that once for 2.1.5 and once more for 2.2.0, we can plot the resulting data to get a better understanding of the different behavior.

Here’s the memory consumption (megabytes) graph for 500,000 requests:

2.1.5 vs. 2.2.0

Well, the 2.1.5 graph is pretty much what I expected. However, the 2.2.0 graph is almost identical – I’m still puzzled by that.

Conclusion

In theory, memory management has been vastly improved in 2.2.0 with the addition garbage collectable Symbols. In practice, long-running processes suffer from a significant build-up in memory consumption over time nonetheless.

Overall, my benchmark seems to indicate that Ruby’s memory management is far from perfect. A lot of work remains to be done…

Is my conclusion wrong? Do you have different experiences with 2.2.0?


Notes
Software: MRI Ruby 2.1.5-p273, MRI Ruby 2.2.0-p0, Rack 1.6.0, and ps.

comments powered by Disqus