web developer & system programmer

coder . cl

ramblings and thoughts on programming...


how important is to optimize haskell code?

published: 02-05-2012 / updated: 03-05-2012
posted in: development, haskell, programming, tips
by Daniel Molina Wegener

Programming in Haskell is cool. You have various functions and abstractions to work with. But it has several considerations to have in mind once you are programming in this language. The main one about its performance and memory usage — in my opinion — is where to place lazy evaluated expressions and where to place strictly evaluated expressions. There are various articles with very good recommendations, but I want to show you a practical example, where you can see how effectively is used the lazy evaluation and released in a good execution point of the example program.

As you know I am currently solving the Project Euler problem set, just as toy project, to practice some Haskell code with some not so complex problems. And yesterday I have implemented the solution for the problem 104, which is trying to find first pandigital digits from 1 to 9 on the last and first digits of a Fibonacci number. So, the Fibonacci should have more than 9 digits to solve this problem. You can see the code of the solution on this link.

If you observe the code, you will see that it is respecting the Haskell tail call convention, where you must use closures to ensure that the compiled code will be using guarded recursions. Otherwise the functions will become standard recursive functions, rather being compiled as guarded recursions — which are similar to tail calls.


-- | Calculates the Fibonacci Term N
fib :: Int                      -- ^ Term to calculate.
       -> Integer               -- ^ Fibonacci Term as Result.
fib n = snd . foldl' fib' (1, 0) . dropWhile not $
        [testBit n k | k < - let s = bitSize n in [s - 1, s - 2 .. 0]]
  where
    fib' (f, g) p
      | p         = (f * (f + 2 * g), ss)
      | otherwise = (ss, g * (2 * f - g))
      where ss = f * f + g * g

Also it defines two functions to check if the number meets the pandigital requirement, where isValidDig is not called immediately from the checkRange recursive function, instead of calling that function, it calls the isValidPan function, where it is called with the constant expression [1..9], avoiding consecutive List instantiation, and using `seq` to avoid lazy evaluations and further stack overloading.


-- | Checks if the given number x have valid last n digits
-- and last n digits
isValidDig :: Integer           -- ^ Number to Check.
              -> [Integer]      -- ^ Sequence to Check.
              -> Bool           -- ^ Is valid or not.
isValidDig x xs
  | length (digits 10 x) < length xs = False
  | otherwise = let ds = digits 10 x
                    l = length xs
                    s = sort xs
                    r = sort $ take l ds
                    t = sort $ take l $ reverse ds
                in r == s && t == s

-- | Checks sequentially if the given range covers the
-- problem of pandigital sequence of digits using the reqDigs
-- sequence.
isValidPan :: Int               -- ^ Number to check.
              -> Bool           -- ^ True when is valid.
isValidPan x = r `seq` isValidDig r [1..9]
               where r = fib x

This allows the usage of a small stack space once it is called, due to strict evaluation, where all List instances are not kept in memory, and similar structures are only evaluated once, without keeping that memory in use, and being released once those non recursive functions are finished. Finally we can use isValidPan on a guard, without the risk of overloading the stack.


-- | Checks sequentially if the given range covers the
-- problem of pandigital problem.
checkRange :: Int                   -- ^ Starting number.
              -> Int                -- ^ End number
              -> Int                -- ^ Returning Number (-1 on failure).
checkRange m n = sCheckRange m n m
  where sCheckRange x y z
          | z >= y = -1
          | isValidPan z = z
          | otherwise = let r = z + 1
                        in r `seq` sCheckRange x y r

The closure used on the checkRange function makes the compiler to start using the tail call as it is required by this kind of functions where you must keep very well controlled the evaluation of the program expressions. Any mistake on the evaluation of the program will lead you to use more memory that it is really required, leading to slowness and delayed evaluations where your code could crash with stack overflows.

Problem 104 Example Memory Usage

Problem 104 Example Memory Usage

As you see on the cost centre chart above, the program is using almost a constant amount of memory along its execution, evaluating Fibonacci terms from 1000 to 9000, in 12 seconds approximately. So, the keys of a good memory usage in Haskell is to use evaluation expressions and guarded recursions properly.

Another example is the problem #255 in the Project Euler. This code is using Haskell tail calls to ensure the optimal memory allocation. For example the function used to calculate the rounded square root using the Heron Operation as follows.

-- | Applies the Heron Operation recursively until it returns
-- the round square root and the number of iterations as tuple.
heronOpRec :: Integer                 -- ^ Heron Operation to Apply
           -> (Integer, Integer)      -- ^ Pair (RSR, Iterations)
heronOpRec a = heronOp a a (digitBase $ numDigits a) 0
  where heronOp o n m a
          | o == 0 = (0, 0)
          | o == 1 = (1, 1)
          | o == 2 = (1, 1)
          | o == 3 = (2, 1)
          | n == m = (m, a)
          | otherwise = let x = fromIntegral m
                            y = fromIntegral o / x
                            s = round ((x + y) / 2)
                            t = a + 1
                        in heronOp o m s t

And the application of this kind of recursive calls can be seen in the following chart.

Problem 255 Example Memory Usage

Problem 255 Example Memory Usage

If you compare both charts and see how both problem solutions are programmed, you will see that they keep a very low limit on memory usage. The problem #255 is using numbers purely, and where problem #104 solution is using lists to hold the digits of the number to be checked as pandigital, that is why it has some peaks on the isValidDig function, but once the program exits that function that memory is deallocated by the garbage collector.


one comment to “how important is to optimize haskell code?”

  1. I’m going to retweet this in my blog. BTW, Didn’t know that Heron Operation. Good post.

post a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>