web developer & system programmer

coder . cl

ramblings and thoughts on programming...


the logrev design

published: 04-03-2012 / updated: 04-03-2012
posted in: development, logrev, programming, projects
by Daniel Molina Wegener

On March 2th, the past Friday, I have released the initial code for a new FOSS project. The current code is hosted at github. LogRev is a log reviser tool, it extracts statistics from — currently — the Apache Access Logs. The initial design only supports few grouping queries but how it was coded will allow some interesting features that I will explain in this article. I hope that you will enjoy the design of this tool.

Well, the first idea is to have a dynamic parser, a configurable tokenizer and information extractor that will allow you to make some queries related to any application log. Currently it uses Parsec as combinatoric static parser for one kind of log entry — Apache Access Logs as was explained on the previous paragraph — and due to the nature of a combinatoric parser, will allow the creation of dynamically placed combinatoric tokenizers allowing the dynamic parsing of various kinds of entries. This is why this project was implemented in Haskell rather than using other language, despite Parsec is implemented in various languages like C++.

So, we have a dynamic parser that will be configured from a DSL, where I’m very close to finish its specification, followed by an Action Stack, where the action stack represents the dynamic placement of sequentially executed combinators to extract the statistics information that you want, with both, predefined data collector combinators and pluggable modular data collectors, where the data collectors will be specified on the DSL for this tool, and probably I will call it LRS or Log Revision Specification.

Finally there is the output method, where it currently is supporting graphic charts and plain text as output. On the future, it will support other kind of output, also specified in LSR.

Simple LogRev Design

Simple LogRev Design

Due to size of server logs, where many of them are really huge, or many times we need to process large amount of data, I have decided to use Haskell as the main language, because it supports very well the usage of combinators due to its type system and supported abstractions, also Haskell supports very well byte code compilation making the code faster enough to 5000 lines of log lines in 1 second, generating the plain text output for two reports, the status report and the country report.

08:00 [dmw@www:3 logrev-sample]$ wc -l main.log 
5000 main.log
08:00 [dmw@www:3 logrev-sample]$ logrev --input=./main.log --output=report
Processing: ./main.log

Status:
       200:       4834    9426271      96.68      96.10
       206:          3     107131       0.06       1.09
       301:          4        733       0.08       0.01
       302:         18       7568       0.36       0.08
       403:         68      11391       1.36       0.12
       404:         73     252012       1.46       2.57

Country:
       AUS:          5      32936       0.10       0.34
       CHL:       4516    6933671      90.32      70.69
       CHN:         90     910020       1.80       9.28
       DEU:          4      28591       0.08       0.29
       ESP:        203     174777       4.06       1.78
       RUS:         45     477554       0.90       4.87
       UKR:          4        518       0.08       0.01
       USA:        133    1217649       2.66      12.41


real    0m1.104s
user    0m0.964s
sys     0m0.092s
08:01 [dmw@www:3 logrev-sample]$ ls -l *.png
-rw-rw-r-- 1 dmw dmw 13085 2012-03-05 08:01 report_country.png
-rw-rw-r-- 1 dmw dmw 12031 2012-03-05 08:01 report_status.png

Also the graphic chart output is very nice, thanks the Haskell Charts hackage. And here you have sample output.

Country Sample Chart

Country Sample Chart
Status Sample Chart

Status Sample Chart

one comment to “the logrev design”

  1. At the very bottom of the linked page is a python worker syslog server that can be hacked to do some awesome things… I am using it for a largish OSS project. I ended up rewriting a syslog server and fabric agent for my evil purposes. For live graphs and charts processing.org is always nice but there is now http://code.google.com/p/pyprocessing/ that is looking very cool.

    http://wiki.loggly.com/pythonlogging

post a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>