Christoph Schiessl's Blog

Implementing Ruby’s Array#flatten in Haskell

Ruby’s Array#flatten is on occasion a very handy method. It reduces n-dimensional arrays to 1-dimensional arrays, containing all the elements of the top-level-array and its sub-arrays. Therefore, it is said to be “flattening” the n-dimensional array.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
~  irb
2.1.4 :001 > [ ].flatten
 => []
2.1.4 :002 > [ [] ].flatten
 => []
2.1.4 :003 > [ 1, 2, 3 ].flatten
 => [1, 2, 3]
2.1.4 :004 > [ 1, 2, [3] ].flatten
 => [1, 2, 3]
2.1.4 :005 > [ [1], 2, [3, 4] ].flatten
 => [1, 2, 3, 4]
2.1.4 :006 > [ 1, [2], [[3, 4], 5] ].flatten
 => [1, 2, 3, 4, 5]
2.1.4 :007 > [ 1, 2, [2, 3] ].flatten # retains duplicates
 => [1, 2, 2, 3]
2.1.4 :008 > [ 2, 1, [4, 3] ].flatten # retains order
 => [2, 1, 4, 3]

As you can see, #flatten is conceptually quite simple. The same functionality can be implemented in Haskell. But, as usual, the solution requires a little bit of abstract thinking.

Naive Approach

Let’s start with 1-dimensional lists and work our up to n dimensions. The first dimension is trivial: All we need is the identity function.

1
2
flatten1 :: [a] -> [a]
flatten1 = id -- flattening a 1-dim list is a no-op

For the second dimension, we could do something like the following:

1
2
3
flatten2 :: [[a]] -> [a]
flatten2 []     = []
flatten2 (x:xs) = (flatten1 x) ++ (flatten2 xs)

This isn’t really what we want, because we’ve told the compiler that the argument has exactly 2 dimensions. For that reason, we have to define another function to support three dimensions:

1
2
3
flatten3 :: [[[a]]] -> [a]
flatten3 []     = []
flatten3 (x:xs) = (flatten2 x) ++ (flatten3 xs)

By now, you can probably see the pattern. We have to define yet another and another and another function for each additional dimension. Thus, the naive approach doesn’t get us very far…

Generic Approach

An instance of Array in Ruby doesn’t restrict the types of its elements in any way (i.e. it’s just a bunch of references to instances of Object). Therefore, the elements of an Array instance can…

  1. … have different types. For example, [1, "abc"] is completely valid in Ruby (the first element is an instance Fixnum and the second element is an instance of String).
  2. … reference other Array instances (all classes in Ruby, including Array, are sub-classes of Object). Consequently, Arrays can be arbitrarily nested.

Haskell is statically-typed, whereas Ruby is dynamically-typed. Due to restrictions imposed by the static type system, there are certain things that cannot be done in Haskell.

Okay, that’s enough background information for now. Let’s flatten some stuff :)


Regarding (1): There is no counterpart for Ruby Arrays with mixed elements in Haskell. All elements in a list, must be values of the same type. That being said, we could define a custom data type to circumvent this issue. However, that only works for types we know about at compile-time.

Regarding (2): Unless we know n at compile-time, the concept of n-dimensional lists doesn’t make any sense in Haskell. Instead, we have to tell the compiler more explicitly what we really want. Namely, a n-ary tree. Fortunately, Haskell makes it very easy for us define such trees with algebraic data types:

1
2
data Tree a = Node [Tree a]
            | Leaf a

What does this mean? Well, Tree a is a algebraic data type with two constructors. The first one, Node, encapsulates a list [Tree a]. The second one, Leaf, encapsulates a value of type a.

Next, we are going to illustrate our new data type by translating the Ruby array foo = [1, 2, [3, 4], []] to Haskell:

1
2
3
4
5
foo :: Tree Integer
foo = Node [ Leaf 1
           , Leaf 2
           , Node [ Leaf 3, Leaf 4 ]
           , Node [ ] ]

The last thing we still need, is the implementation of flatten itself. However, that’s the easy part: The function simply has to recursively traverse the given tree and collect the all values from its Leafs.

1
2
3
flatten :: Tree a -> [a]
flatten ( Leaf v  ) = [v]
flatten ( Node ts ) = concat $ fmap flatten ts

We can use the tree we’ve defined before (i.e. foo) to test the function in GHCi:

1
2
3
4
5
6
7
8
~  ghci
Prelude> :l Flatten.hs
[1 of 1] Compiling Flatten          ( Flatten.hs, interpreted )
Ok, modules loaded: Flatten.
*Flatten> let foo = Node [ Leaf 1, Leaf 2, Node [ Leaf 3, Leaf 4 ], Node [] ]
*Flatten> flatten foo
[1,2,3,4]
*Flatten>

[1,2,3,4] is of course exactly the result we’ve expected.

Conclusion

If you are not familiar with Haskell, you might think now that this is a lot of overkill. Personally, I’m a big proponent for Haskell, but I agree that #flatten is easier to implement in Ruby.

However, I would argue that this a consequence of Haskell’s static type system and not its functional nature. If you disagree, I encourage you to implement #flatten in a statically-typed and object-oriented language (e.g. Java) – you’ll see what I mean.

What’s the benefit of functional programing and static typing? Safety and by extension confidence in your code. Dynamic typing is convenient, that’s for sure. But, saying that the price we are paying for that convenience is high, would be an understatement. There’s approximately one gazillion ways in which your Haskell program can blow up. With Ruby and dynamic typing you get two gazillion…


While writing this post, I’ve actively tried to break Ruby’s #flatten to substantiate my point. This is what I came up with after a couple of minutes:

1
2
3
4
5
6
7
8
9
10
~  irb
2.1.4 :001 > a = [1, 2, 3]
 => [1, 2, 3]
2.1.4 :002 > a << a
 => [1, 2, 3, [...]]
2.1.4 :003 > a.flatten
ArgumentError: tried to flatten recursive array
  from (irb):3:in `flatten'
  from (irb)34
  from /Users/cs/.rvm/rubies/ruby-2.1.4/bin/irb:11:in `<main>'

It never ceases to amaze me how easy it is to break Ruby programs.


Notes
Software: MRI Ruby 2.1.4-p265, Glasgow Haskell Compiler 7.8.3.

comments powered by Disqus