web developer & system programmer

coder . cl

ramblings and thoughts on programming...

a good start on logrev

published: 17-03-2012 / updated: 17-03-2012
posted in: development, haskell, logrev, programming, projects
by Daniel Molina Wegener

As you know ApacheLogRev or logrev as short name, is my first Haskell FOSS project — I have other FOSS projects written in C and Python — and I want to share some experiences about logrev while I was developing some features that it should have in the future, because it is still under development and it is not finished yet. Basically, it is using the Parsec combinator based parser. One of the best advantages of this parser builder is the fact that it is generating parsers on run time, rather than creating fixed parsers using BNF grammars which are subject of static context free processing, rather than creating dynamic parsers.

Parsec allows you to create a parser dynamically on runtime, rather than compiling a parser specification and grammar specification on compile time as being done with flex and bison. As combinatorial construct, seems that can write a dynamic compiler generated in run time, instead of having a compiler built as static component which cannot be modified in run time — with some exception that are not covered by flex and bison, like any C++ compiler, which is capable to understand a small subset of syntax sugar enabled features like templates and operators.

I have created a DSL called Log Reviser Specification or LSR as short name which allows you to create log processing statements defining variables, processing functions and specify a reports based on the collected data. You can see the LSR parser on the github repository and a sample LSR file on the same repository.

λ λ λ

The problem of memory leaks — which is present on all languages, without exception — had come to my mind once the program was not able to parse a huge file without crashing. The fact that every language can be subject of memory leaks, is so real as the fact that you cannot abuse of recursion in procedural and object oriented languages. So, I have started profiling the application. With the initial revision, I got LogLine data type being instantiated many times without deallocation, with many instances which were kept in memory due to the lazy evaluation behaviour which is natural in Haskell. Tracing its memory usage, I got the following chart.

ApacheLogRev with Memory Leaks

ApacheLogRev with Memory Leaks

The program execution was keeping a peak of LogLine instances on memory without being released due to Haskell lazy evaluation. So, now I am using seq and deepseq to reduce the amount lazy evaluations made in Haskell.

instance NFData LogLine where
  rnf a = a `seq` ()

parseLogLine :: String -> Maybe LogLine
parseLogLine s = let
  r = parse logLine "[Invalid]" s
  in case r of
          Left  _   -> Nothing
          Right itm -> itm `deepseq` Just itm

foldLogLines :: [LogRevStatsAction]
                -> LogRevOptions
                -> [String]
                -> [LogRevStatsAction]
foldLogLines [] _ [] = []
foldLogLines ms _ [] = ms
foldLogLines ms o (x : xs) = let
  lm :: Maybe LogLine
  ns :: [LogRevStatsAction]
  lm = x `seq` parseLogLine x
  ns = lm `seq` o `seq` procLogMachine o ms lm
  in foldLogLines ns o xs

I have replaced the initial folding function — where it was using fmap and hGetLine — with the lazy evaluation function hGetContents for file descriptor reading and strict function for result processing foldLogLines. So, for file reading currently I am using lazy reads, but strict evaluation for line processing, allowing large file processing with very good results, thanks to the left folded strict evaluation of the lazy file reading. So we can appreciate the difference in the following chart.

ApacheLogRev without Memory Leaks

ApacheLogRev without Memory Leaks

λ λ λ

So, the garbage collector is really being used without keeping references on memory and instances linked to past results thanks to the tail call implementation in GHC, where we get an initial peak of memory usage which is being reduced along the execution. Also, if you have reached the Java OutOfMemoryError, .NET OutOfMemoryException, «PHP Allowed Memory Size Exhausted Fatal Error», and similar ones in your favorite language, and your solution is to increase the memory limits, your are doing it wrong. The fact that you have memory leaks in your code and you must correct the memory leaks rather than increasing the memory limits, just is telling me that you do not have any idea about memory management. Despite many languages now are using garbage collectors.

No coments yet.

post a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>