The challenge of this month at programando.org, a programming blog written by Eduardo Díaz, is presenting a challenge related to statistics, mainly the calculation of the standard deviation and the variance. I usually place some programming challenge responses here in my blog. From now I will place the site in the post title and the challenge title, so you can track the results from and other competitors. Well, as usual, I am practising with Haskell, because I really want to reach very professional skills with it.
The challenge about the standard deviation and the variance is «calculate the variance as fast and efficiently as possible». An extra challenge is to implement an algorithm with the «precisest result». Well, I have made a small search about the problem. Seems that it is widely covered, with many algorithmic implementations. What have meet my attention is the fact that one of the most numerically stable algorithms is one made by Donald Knuth in his well known book «The Art of Computer Programming: Semi-numerical Algorithms (Vol. II)», on the section 4.2.2, where he treats the problems — as many people I am still thinking that it is a problem — that comes with floating point arithmetic.
So, here is my Haskell solution, a fully lazy read recursive function using patterns. I think that it is small and efficient.
olVar :: [Float] -> Float olVar [] = 0.0 olVar xs = lolVar xs 0 0 0 where lolVar [] n' m' r' = (r' / (n' - 1)) :: Float lolVar (t:ts) n m r = let n' = n + 1.0 d' = t - m m' = m + d' / n' r' = r + d' * (t - m') in lolVar ts n' m' r'
Some considerations about it, which are simpler ones, I have tested it using a 100.000 random numbers list generated with my HiFn 7955 and I am using Float rather than Double, because the number list is small, you can change the implementation to Double to gain more precision. Also, I cannot stop thinking that this algorithm is one of the most numerically stable ones from its creation some decades ago and it is O(n), as many average related algorithms.
Hi Daniel,
thank you for your response, I don’t know if you have read this document «What Every Computer Scientist Should Know About Floating-Point Arithmetic» I think is fundamental about numeric programming.
I knew about that article, but I did not notice how deep was it treating the topic. I just had fast read on the article and looks really well, today I will read it more deeply (also seeking some references), because I think that I will need to use some topics covered by it.