As you I am a Haskell programmer. I have started a FOSS project called Apache Log Reviser, or logrev as short name. I have been playing with code optimizations in Haskell on that project. So, I want to share some experience and considerations regarding its memory usage, mainly on what is related to lazy evaluations and non-strict bindings. Where lazy evaluations should be used every time you need to read or write data with delayed behaviour, for example reading large lists or even lazy lists, and non-strict bindings should be used every time you need to request data to be placed on memory for immediate reading or writing.
I have replaced the main file processing loop — the one that is collecting data from log lines — forcing lazy evaluations and non-strict bindings where they are required. The resulting code after the optimization is as follows.
foldLogLines :: LogRevStatsMap -> LogRevOptions -> [String] -> LogRevStatsMap foldLogLines ms _ [] = ms foldLogLines ms o ~ls = fll ms ls where fll rs [] = rs fll rs (x : ~xs) = let lm = parseLogLine x ns = lm `seq` o `seq` procLogMachine o rs lm in seq ns $ fll ns xs
Again I am using seq for non-strict bindings but now I made the lines argument pattern entirely lazy, including the constructs inside the closure which brings even more laziness to the folding functions. The original optimization was just an implementation of a left folding functions, now it works using the same left folding function using a closure which uses lazy reads of the given list of strings — or log lines. As you remember the first chart obtained was using about ~33MB of memory, as follows.
With the second optimization, the resulting memory usage has been reduced to ~13MB. Despite the readFile and lines functions are lazy, we must implement function with the same lazy behaviour, and we must use non-strict bindings only where we need immediate access to the data. Also, you should know that let and where statements have non-strict bindings for their variables, so they are placed immediately on the heap, unless you declare them lazy explicitly. The resulting profiling chart of the second optimization is as follows.
Finally I have concluded that we must not underestimate the power of lazy evaluations and Haskell patterns. Once they are used properly, you gain very good performance and optimal memory usage. Enjoy Haskell as programming language.