coder . cl » python

how I won my latest project

Daniel Molina Wegener — Wed, 19 Sep 2012 18:46:18 +0000

As freelancer I am constantly looking for new projects. Sometimes is hard to find good projects, mainly because not all customers are technical customers, making hard to communicate some aspects of the project, among other stuff like programmer evaluation, without too much technical evaluation, mostly based on his accreditations. One of my latest projects was earned by handling a programming problem on the middle of the interview, I was using Skype to share my screen and allow the customer to review the source code and watch me how I was solving the problem. That was great.

I was very glad to see how a technical customer was reviewing my code, making live suggestions about the solution. The problem was not so hard to solve, and was related to parsing algorithms. Given a set of characters on a string, you must find brace pairs, like {}, () and [], distributed on any place of the string but the correctness of the format is given with symmetric pairs. For example “a[a[b(c)]]{d}” is a correct string, and “a[a][b(c){d}” is not. How to parse that kind of strings?, using regular expressions was not an option, so the algorithm is just an O(n) algorithm that uses a stack and a dictionary to review each brace pair. The language of choice was Python, and was great to handle that problem in less than 20 minutes.


#!/usr/bin/env python
# -*- coding: utf-8; -*-


def par_match(str_):
    test_m = {'{': '}', '(': ')', '[': ']'}
    chm = None
    stk = list()
    for ch in str_:
        if ch in test_m:
            stk.append(test_m[ch])
            continue
        if ch in test_m.values() and len(stk) > 0 and ch == stk[-1]:
            stk.pop()
            continue
        if ch in test_m.values() \
           and ((len(stk) > 0 and ch != stk[-1]) or (len(stk) == 0)):
            return False
    return (len(stk) == 0)


if __name__ == "__main__":
    print(par_match("{a[a(a)a]a}"))
    print(par_match("{{()[]}}"))
    print(par_match("}}}"))
    print(par_match("{}()]"))
    print(par_match("[{}()"))
    print(par_match("< {<{}>}>"))

The simplicity of the algorithm is very nice. Is pretty clear how the stack is used to hold brace positions and which pair should match the next closing brace. All error conditions are checked at the end, passing to the next character in the string if the matching char is correct and checking that the stack is empty at the end to check if there an opened brace that was not closed. The customer was pleased with the solution and he have hired me. But the interesting fact of getting hired due to the technical interview through Skype was very nice. We were talking about the solution and the perspective to solve the problem. As we were getting known as customer and freelancer.

Is very nice to work with people that understands your perspective on programming and your passion for your work. Even if we are not sharing a personal meeting on a Starbucks or Starlight coffee shop, having a nice time, was nice to meet through Skype, have a coffee made from beans on my moka pot and talk about programming with a customer that takes programming as a serious activity and recognizes that it cannot be handled by anyone. Also he have shared with me that there are some very good technicians, with a deep technical knowledge, but not so good solving programming problems. Seems that my customer have accepted that not everyone have real conditions to program, but having good skills on programming is just a subject of interest and practice, not a mechanical learning on a classroom.

© Daniel Molina Wegener for coder . cl, 2012. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

simple parallel programming example

Daniel Molina Wegener — Sat, 25 Aug 2012 13:50:20 +0000

There are several parallel models. One of the most used parallel programming models is the threading model, where you create threads or lightweight processes which are sharing memory and resources, like open files. A thread is an independent execution space that shares memory with its creator inside the its parent process. So, imagine that you need to process 5 files containing call time duration and you need to sum and join that data. Processing that set of files sequentially probably is slower than doing a parallel program to process all files at once. Here is an example written in Python that can help you to understand a threaded solution for this problem.

So, with a set of 5 files or more, you can have one or more functions processing the set of files in parallel without blocking the execution, and doing the same task independently. Probably you need to write some data structure like a dictionary, but you can use locks to avoid racing conditions. On the following example you have a concurrent function called pproc_file(), which reads and writes the GLOBAL_HASH dictionary concurrently and uses the global lock MAIN_LOCK as access flag.


import threading as thr
import os
import sys
import csv


GLOBAL_HASH = dict()
MAIN_LOCK = thr.RLock()


def pproc_file(fname):
    """ Process the given File  """

    global GLOBAL_HASH
    global MAIN_LOCK

    if not os.access(fname, os.R_OK):
        raise Exception(u"Cannot Open %s" % (fnm))

    try:
        fdi = open(fname)
        reader = csv.reader(fdi)

        # skip header
        reader.next()

        for cdata in reader:
            if not cdata:
                continue
            ditm = cdata[1]

            # you can read in parallel
            if cdata[1] in GLOBAL_HASH:
                nsum = GLOBAL_HASH[ditm] + int(cdata[2])
            else:
                nsum = int(cdata[2])

            # but you must write concurrently
            MAIN_LOCK.acquire()
            GLOBAL_HASH[ditm] = nsum
            MAIN_LOCK.release()

        fdi.close()
    except Exception, exc:
        print(repr(exc))

Then you can place one file to be processed by this function using threads, working almost independently, allowing faster processing, without sequential processing over each file. This will make the overall process to finish faster.


def main():
    """ Main Program """
    pfiles = ['phone-0.csv',
              'phone-1.csv',
              'phone-2.csv',
              'phone-3.csv']

    thrl = list()
    for myf in pfiles:
        sthr = thr.Thread(target=pproc_file, args=[myf])
        sthr.start()
        thrl.append(sthr)
    map(lambda x: x.join(), thrl)

Finally the main function launches as many threads as files are required to process and the main thread — usually with ID 1 in Python — call the join() method of each thread, making the main thread to wait or block for the created threads to be finished. Without the join() call, the program finishes and terminates the child threads. So, is better to wait the final processing and all files to be read entirely. The Linux implementation of POSIX threads, creates a copy of its parent process like fork() does, but it shares the memory space with its creator, so a thread is internally a sub-process sharing the memory space with its creator, rather than a lightweight sub-process.

© Daniel Molina Wegener for coder . cl, 2012. | Permalink | 2 comments | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

typing is not a problem

Daniel Molina Wegener — Sun, 10 Jun 2012 17:05:16 +0000

Typing is not a problem. We have many programming languages which are using dynamic typing. Dynamic typing is not so good as you can think. It allows you to do much stuff, where you can evaluate some expressions like (int)"1" + 1 = 2, without type errors. The problem with that kind of flexibility is the fact that there is no standard on boolean evaluations, leading to logic errors, and obviously to program failures. Even in some type-safe programming languages, they allow some stuff like that, there is C++98 and further versions allowing a wide variety of type casts, where any mistake on the casted values can generate logic errors and failures.

I prefer to see static typing and strong-static typing as a good feature, mainly strong-static typing. It avoids many programming errors. Please imagine a common example. We have a legacy system made on top of PHP, and we are migrating that system to Python, or even Node, you can find logic errors during the migration, so it is not so easy as you are thinking. The application which is subject of the migration is consuming a typed service, where one element received from the service is a string, but PHP has an automatic conversion to numeric types, but not Python, and even Node. Please run the following examples and see the result.

Then we migrate that logic using literals to Python and we get an exception with a dynamic type that is not so dynamic as you think. Just think what happens on logic evaluation level and the amount of logic errors that you can reach with such differences.

# Python code...
# comes a from the service as string
a = "1.0"
# another element comes as integer
b = 1
# TypeError: cannot concatenate 'str' and 'int' objects
c = a + b
# no result due to the exception
print(c)

In Node is not so different. Rather than getting the addition of 1.0 and 1, we get a string concatenation and not more numeric operations on that kind of strings. The amount of differences between dynamic type systems is huge, just try to make a boolean evaluation of empty lists or arrays in different dynamic type systems, some languages are evaluating empty lists as true and others are evaluating empty lists as false. Just take a look on this messy jungle of boolean evaluations in PHP. At least Python has a formal approach to its boolean evaluation and expression evaluation on its Language Reference on the Chapter 5. Expressions.

// Node.js / JavaScript code..
// comes a from the service as string
var a = "1.0";
// another element comes as integer
var b = 1;
// string concatenation rather than addition
var c = a + b;
// finally the result is "1.01"
console.log(c)

So, the messy world of expression evaluation in dynamic languages is not so fun as you think, there are huge differences between languages, there is no standard way on evaluating expressions, leading to type safety errors and logic errors that can generate various programming mistakes. I still prefer strong-static typing to build applications.

Also, you will probably find some interesting facts. If you are a category theorist, you will probably find that with the JavaScript approach on types you cannot apply some abstractions like Monads, so JQuery is not a Monad because it is bound directly to a type, and JavaScript cannot bind types. And even you cannot apply some typed lambda calculus combinators in JavaScript due to its dynamic type system, some times disallowing some β-reductions requiring explicit type conversion despite it has a dynamically typed system because you cannot get the desired type conversion, forcing the usage of K combinators and similar ones on its dynamic type system.

But even with static typing you can get logic errors. Just track the amount of errors that your favorite financial company has on its web system built on top of their cool Java technologies or NET technologies. That is why I prefer languages like Haskell, their strong-static typing system are really type-safe, bringing a better final result. Programming Haskell is slower than programming JavaScript, but it is real type-safety is meet.

-- Haskell code...
module Main (main) where

main :: IO ()
main = let a = "1.0"
           b = 1
           c = fromIntegral b + (read a :: Float)
           -- the result is the desired 2.0 as Float
           in print c

© Daniel Molina Wegener for coder . cl, 2012. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

charset detection with python

Daniel Molina Wegener — Sat, 26 May 2012 16:21:03 +0000

How many times did you require to detect the charset of a given file?. This task is quite easy to do with Python. Currently I am working in an application that requires to parse CSV files, where they come from Linux and Windows systems. The problem — as usual — are the Windows files, which are not using the UTF-8 encoding, rather than using Unicode, they are being exported as Windows-1250 and similar encodings. This is a big problem while you are trying to import data to Unicode collated tables, like those using UTF-8 encoding. On Python, the chardet module does all the magic.

So, the following code opens the file that is passed as argument to the script, reads its contents, and passes the content to the detect() function of the chardet module. The result is a dictionary with a confidence level and the detected encoding: {‘confidence’: 1.0, ‘encoding’: ‘UTF-8′}. The confidence level is a percentage and the encoding key is the detected encoding.


#!/usr/bin/env python
#
# -*- coding: utf-8; -*-

import sys
import chardet

file_handle = file(sys.argv[1])
content = file_handle.read()
file_handle.close()
result = chardet.detect(content)
print(repr(result))

You do not need to read the entire file. You can read a portion, but if the file is using another encoding, like UTF-32, which uses 4 bytes, you must ensure that you will read a multiple of 4 to ensure safe input, something like read(512). Also, those buffers using the BOM mark, are easier to detect. Also, for using the csv Python module with this feature, mainly for Windows-1250 and similar encoded files, you should try creating an UTF8Recoder, like this one.

© Daniel Molina Wegener for coder . cl, 2012. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

writing python daemons

Daniel Molina Wegener — Sat, 12 May 2012 19:54:41 +0000

A daemon in Unix™ and similar operating systems, is a background process, which runs without using the terminal or the X11 system as main I/O system. In other operating system, this is called service. A daemon has a very specific task to do, for example the NTP daemon keeps your computer date updated connecting to NTP servers. Many applications that require asynchronous tasks, require this kind of applications to make your life easier. For example a job queue processor for the Gearman job server can be handled with this kind of applications.

A daemon, classically on most Unix™ systems, first closes the three main I/O streams: stdin, stdout and stderr, then the fork(2) system call is used, creating an image of the current process, once the call is made, an exit(1) call is made on the parent process, and the child process keeps working in background. Due to the Python philosophy of being a productive language, this not so complex process is already implemented in the daemon module, so with this module you can create a daemon program.


import os
import sys
import daemon
import atexit


def main():
    """
    Main Program
    """

    install_signal_handler()
    atexit.register(at_exit_handler)

    opts = parse_opts()
    config = parse_config(opts)
    if not opts.cwd:
        print("No Working Directory")
        sys.exit(0)

    with daemon.DaemonContext():
        os.chdir(opts.cwd)
        install_signal_handler()
        start_schedule(config, opts)


""" Executes the main() Function """
if __name__ == '__main__':
    main()

If you observe the code, it changes the working directory using os.chdir() function, because once a daemon is created, its working directory is automatically changed to the root directory or /. Some daemons or servers like the Apache HTTP server, have a precompiled working directory, but it also allows to use chroot directory to make it work outside the precompiled working directory. Also, it is installing signal handlers, to allow signal processing, like SIGHUP and SIGCHLD. The parse_opts() and parse_config() functions, are functions to parse daemon arguments and daemon configuration using the optparse and ConfigParser modules respectively.

The atexit module is used to ensure that most program resources — like files, connections and similar stuff — are released or closed, once the daemon process terminates. So, the following atexit example shows a routine releasing resources.


from django.db import connections
import atexit

def at_exit_handler():
    """
    At Exit Function (close all connections)
    """
    for con in connections.all():
        try:
            con.close()
        except Exception, exc:
            log().error(exc)
    log().close()

The signal handler have very similar behaviour, it register a processing callback, which is executed using each time a signal is received. For example on the SIGHUP signal, I am doing a daemon rehash, rather than doing a daemon termination.


import signal as sig
import traceback as tb


def rehash_daemon():
    """
    Rehash the daemon, reading configurations again.
    """
    global DAEMON_CONFIG
    DAEMON_CONFIG = parse_config()


def install_signal_handler():
    """
    Signal Handler Installer
    """
    try:
        sig.signal(sig.SIGHUP, rehash_daemon)
    except Exception, exc:
        log().error(repr(exc))
        log().error(tb.format_exc(exc))
        return False

This will allow you to create a robust daemons, capable to handle signals, resource releasing on the exit event and similar stuff. Remember that as any language with a garbage collector, Python has the disadvantage of leaking memory if you do not cut the object references properly, due to the reference counting used by the garbage collector inside the GIL cycles. Your daemon design should be optimal.

© Daniel Molina Wegener for coder . cl, 2012. | Permalink | 2 comments | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

django and amazon s3

Daniel Molina Wegener — Thu, 19 Jan 2012 13:18:45 +0000

Amazon S3 is a well known web based storage system provided as SaaS service provided by Amazon Web Services. On Django you can integrate that service using the storage interface called Django Storages, but you must have some considerations using that SaaS storage interface. Mainly regarding the Date header sent to the service on each read, write and similar operations, where you must send an updated header with the proper Time Zone and format.

Once you have configured and working the django-storages package, you should fill the Date header or the x-amz-date header each time that you do a request to S3. So, the only way to do that is not to leave the module sending the date automatically, instead you should write the settings variable AWS_HEADERS.


import pytz
from datetime import datetime
from django.conf import settings

def get_aws_date(self):
    """
    Returns the server date formatted and using the server
    time zone localized to be used as Date header with
    Amazon S3.
    """
    stz = pytz.timezone(settings.TIME_ZONE)
    dtm = stz.localize(datetime.now()).strftime("%a, %d %b %Y %H:%M:%S %z")
    settings.AWS_HEADERS['Date'] = dtm
    settings.AWS_HEADERS['x-amz-date'] = dtm
    return dtm

Also, you must consider this requirement to work with S3 on all your requests. Since django-storages checks if the AWS_HEADERS has the Date or x-amz-date headers set, you must set that header each time that you make a request to S3, so you cannot use a batch read() from the storage or a batch write() to the storage, because it will use the previously sent Date header and it will fail, because it is considered inconsistent by the S3 authentication mechanism. If we use S3 as default storage, the example below will fail, because it will send file chunks with the same Date header, because the header was set previously.


from django.core.files.storage import default_storage

if default_storage.exist('test-large-file.mp3'):
    mp3file = open('test-large-file.mp3')
    s3file = default_storage.open('test-large-file.mp3')
    mp3file.write(s3file.read())
    mp3file.close()
    s3file.close()

So, you need to use small chunks to read from the storage, as the example below.


from django.core.files.storage import default_storage

if default_storage.exist('test-large-file.mp3'):
    get_aws_date()
    mp3file = open('test-large-file.mp3')
    s3file = default_storage.open('test-large-file.mp3')
    buff = s3file.read(settings.AWS_CHUNK_SIZE)
    while buff:
        mp3file.write(buff)
        try:
            buff = s3file.read(settings.AWS_CHUNK_SIZE)
        except Exception, exc:
            buff = None
    mp3file.close()
    s3file.close()

But also you must consider few issues about this. Since AWS_HEADERS is a global variable, writing to that variable will slow down your code because it lacks time on GIL usage, also it will lack your threaded application if it written twice, even if you are using locks and Django can handle parallel writes to that variable. So, be careful reading and writing large files from S3, and take a look on how the Date header is sent on each request.


def _add_aws_auth_header(self, headers, method,
                         bucket, key, query_args):
    if not headers.has_key('Date'):
        headers['Date'] = time.strftime("%a, %d %b %Y %X GMT",
                                        time.gmtime())

    c_string = canonical_string(method, bucket, key,
                                query_args, headers)
    headers['Authorization'] = 
        "AWS %s:%s" % (self.aws_access_key_id,
                       encode(self.aws_secret_access_key, c_string))

Where the %X specifier for localized machines with an environment variable LC_ALL different from C will throw the wrong date format disallowing your application to work with S3. This is a well known bug reported on this link. So, the right implementation, should be as follows.


def _add_aws_auth_header(self, headers, method,
                         bucket, key, query_args):
    if not 'Date' in headers:
        stz = pytz.timezone(settings.TIME_ZONE)
        dtm = stz.localize(datetime.now()).strftime("%a, %d %b %Y %H:%M:%S %z")
        headers['Date'] = dtm

    c_string = canonical_string(method, bucket, key,
                                query_args, headers)
    headers['Authorization'] = 
        "AWS %s:%s" % (self.aws_access_key_id,
                       encode(self.aws_secret_access_key, c_string))

Good luck using S3.

© Daniel Molina Wegener for coder . cl, 2012. | Permalink | 6 comments | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

decorated template tags in django

Daniel Molina Wegener — Fri, 06 Jan 2012 17:26:01 +0000

Django provides an API to create custom template tags for those applications mounted over this nice Web Application Framework. Sometimes we need to decorate our functions, but you cannot decorate a function registered as tag in Django. The most elegant solution to solve this problem is to use a closure which will wrap the template tag function to be used decorated template tag.


from django import template
from django.conf import settings

from owner.common.decorators import html_escape_output

register = template.Library()

def fmt_owner_name(_owner):
    """
    Formats the Owner Name
    """
    @html_escape_output()
    def _fmt_owner_name(owner):
        nms = u'%s %s' % (first_word_on_string(owner.first_name),
                          first_word_on_string(owner.last_name))
        if len(nms) > settings.MAX_NAME_ON_CARD:
            return u'%s...' % (nms[0:settings.MAX_NAME_ON_CARD])
        else:
            return nms
    return _format_owner_name(_owner)
fmt_owner_name = register.simple_tag(fmt_owner_name)

So, in this example, the template tag will escape the owner name, where all template tags in Django are considering HTML output as safe output, so they do not escape the HTML output automatically, even if you use is_safe and needs_autoescape properties, they require special treatment when you need to escape the HTML tags. In the example above the closure _fmt_owner_name is decorated with @html_escape_output which forces escaped HTML output to the Django template.

Also you can stack any number of decorators that you want. I think that this avoids using cgi.escape and similar stuff on each return statement. This helps you on Django template tag processing with common operations, like the example above. And it reduces the amount of code, making it more legible and simple, where we are using the fourth aphorism from the Zen of Python: «complex is better than complicated». Then we apply the template tag in the Django template as follows.



  {% fmt_owner_name owner_obj %}

I hope that this will help you on the task of reducing the amount of repeated calls.

© Daniel Molina Wegener for coder . cl, 2012. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

integrating selenium and django

Daniel Molina Wegener — Mon, 12 Dec 2011 20:53:34 +0000

As you know, using unit tests will only provide a testing over algorithms and low level interfaces. For a higher level approach, you have the automated testing suite that provides the Selenium, which provides you a test case API to be used in functional tests. If you want to automate functional tests under Django, you can use the selenium IDE to record your actions on the web site, and export them as test case instructions for the selenium API, which will run those tests as you in the web browser, or even other tools like Html Unit. This includes several browsers supported by the API, including a server that allows you to mount a test grid using various machines.

Without falling in numerous details, I will try to explain how to mount selenium tests under Django. The very first step is to download and install the selenium plug-in, that allows you to create automated functional tests for the selenium API.

Selenium IDE

Once you have created the functional test using the record option, you can save it with the native selenium format, which is a simplified HTML file. Once you saved the test case, you can export the test case using the export option on the file menu, where you will see two formats provided for Python integration, called Web Driver and Remote Control. The first one called Web Driver is used with a local browser — on the local test server — and the second one called Remote Control should be used with grid servers, an option that supports multiple browsers by calling remote actions on a grid of parallel test servers. Once you export the file in the Web Driver format, you see that the python code is based on the standard unit testing suite, as the example that follows.


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import unittest, time, re

class TestWebdriver(unittest.TestCase):
    def setUp(self):
        self.driver = webdriver.Firefox()
        self.driver.implicitly_wait(30)
        self.base_url = "https://myserver.com/"
        self.verificationErrors = []

    def test_login_view(self):
        driver = self.driver
        driver.get("/home/")
        driver.find_element_by_link_text("Login").click()
        driver.find_element_by_id("username").clear()
        driver.find_element_by_id("username").send_keys("test@myserver.com")
        driver.find_element_by_id("password").clear()
        driver.find_element_by_id("password").send_keys("testpass123123")
        driver.find_element_by_css_selector("input.submit.login-but").click()
        driver.find_element_by_link_text("MainPanel").click()

    def is_element_present(self, how, what):
        try: self.driver.find_element(by=how, value=what)
        except NoSuchElementException, e: return False
        return True

    def tearDown(self):
        self.driver.quit()
        self.assertEqual([], self.verificationErrors)

if __name__ == "__main__":
    unittest.main()

To run that test, the file should be saved on the app/tests/ directory, and should run using a standard browser. In my case, I am using django-jenkins for continuous integration with Jenkins or Atlassian Bamboo. So, you should add the Python Selenium API, where you will find the webdriver package to use with the web driver API — remember that it is used with a local browser — or the selenium package that provides API calls to use grid servers for parallel tests.

So, the functional test integration runs as standard integration for any build. You can separate the test suites customising the build script for Jenkins or Bamboo and have separate reports. That is a preferred option if you want to keep detailed information about your tests. Enjoy integrating automated functional tests.

© Daniel Molina Wegener for coder . cl, 2011. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

why to go with multi-paradigm?

Daniel Molina Wegener — Fri, 18 Nov 2011 15:07:57 +0000

Hybrid languages are cool. Most powerful programming languages are those languages which can handle a multi-paradigm scope. You can reduce the amount of code considerably when you are using a multi-paradigm approach. Due to the imperative origin and procedural approach of object oriented languages, having only object oriented language approach is not enough to solve problems. For example as we reviewed the widely used MapReduce distributed computing model has its origin on the functional paradigm in two higher order functions called map and reduce — where the synonym of reduce is fold, applying a kind of Monoid where the data is lifted and transformed by the map function and then is processed and reduced by the fold or reduce function.

tl;dr

I prefer multi-paradigm languages because you can do cool stuff with less code.

why do I prefer it?

If you reduce the amount of code that you are programming, you can decrease the amount of possible bugs by reducing its cyclomatic complexity. So, using functional approach in certain cases will reduce the amount of code that you are producing, increasing your productivity considerably. Take a look on the following code made on PHP to find the lowest value in an array of associative arrays.

 30, 'b' => 20, 'c' => 10),
             array('a' => 29, 'b' => 19, 'c' => 9),
             array('a' => 28, 'b' => 18, 'c' => 8));

foreach ($tst as $t) {
    if ($t['a'] < $min) {
        $min = $t['a'];
    }
}

echo "{$min}n";

 ?>

Now take a look on the same problem but implemented in Python using multi-paradigm approach, with functional features and using higher order functions.


### Python example using higher order functions and
### functional approach

tst = [{'a': 30, 'b': 20, 'c': 10},
       {'a': 29, 'c': 19, 'c': 9},
       {'a': 28, 'c': 18, 'c': 8}]

t = min(*tst, key=lambda x: x['a'])
print(t['a'])

You can clearly see that the code was reduced to only one line of code using a functional approach, using min as higher order function, so the η-reduction is evident. And you can handle a wide variety of problems using the functional approach, reducing the amount of lines of code considerably, increasing your productivity and doing better code. Do you remember how to calculate the factorial?, do you remember the factorial example in Python?. Take a look on the PHP implementation against the Python implementation.

Now take a look on the Python implementation. You will notice a multi-paradigm approach using procedural and functional paradigms, using higher order functions again.


### Python factorial example

def fact(n):
    if n == 0: return 1
    return reduce(int.__mul__, xrange(1, n + 1))

print(fact(4))

I really like to know how functional programming can allow you to reduce the amount of lines of code which are required to complete certain tasks, and how it is expressive as procedural code does, but it is pretty much well fluent and it has less cyclomatic complexity than the procedural approach. Functional code is very legible. The other advantage is the fact that some compilers supporting the functional paradigm are enabled with tail calls implementation, so you do not need to worry about creating stack frames and generating stack overflows with recursive functions. Languages that currently supports multi-paradigm approach are JavaScript, Python, Ruby and Scala, among other languages.

So, I recommend you that you should start learning some functional programming principles and techniques. On the past I got tired of those companies with silly restrictions about using reductions and making the code looks like literate programming code, instead of well qualified enterprise code, and since I am working as freelancer, I just use the best practices that each language requires.

Do not get stuck in object oriented and procedural paradigms only, learn other paradigms and approaches, so you can handle a wider number of problems reaching better solutions, more optimal in lines of code, cyclomatic complexity, algorithmic complexity and time complexity. In some manner, what I can say is the fact that more lines of code do not means better, it depends on the language that you are using and how do you implement your solutions, and having a multi-paradigm approach can lead you to deliver better code. Sometimes — certain languages — can allow you to implement the same solution with less code, you just need to know the proper technique.

Many languages now are including some functional features, and are going multi-paradigm. For example Java 8 has included Lambda Expressions, C# includes Lambda Expressions, they just need to implement Closures to start doing some cool stuff in certain cases. PHP 5.3 now includes Lambda Expressions or Anonymous Functions. So, you should learn about the functional paradigm, among other stuff that will be required in the future.

© Daniel Molina Wegener for coder . cl, 2011. | Permalink | No comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)

instant xml api using pyxser

Daniel Molina Wegener — Sat, 05 Nov 2011 13:42:24 +0000

Probably you do not understand pyxser at all. It is a serializer and deserializer which converts Python objects into XML as plain text. Among JSON and other formats, XML can help in some tasks like transmitting object through the network, for example building API calls using remote queries. Here I will guide you on how to build an XML query API for your Django driven application in few minutes. You just need to understand how pyxser works and how to use the pyxser module. Remember that you can see the documentation once it is installed, even if you do not have Internet, just by running the pydoc daemon with pydoc -p 8080 and connecting to the port 8080 in your machine — you can choose another port if it is not working.

tl;dr

You can setup a query API that throws XML through HTTP under Django using pyxser.

advice

All examples here are working, you must be really careful with the authentication and object permissions before using the routines in this post. So, try to wrap those routines correctly using the Django authentication components to filter query requests. Probably OAuth related modules may help. Also the examples are not using the Python and Django best practices, so you need to adjust them to fit the best practices requirements. Finally, do not take all examples very literal, they are just examples and this is just a proof of concept article.

serializing model objects

The pyxser extension — which is written in C and uses libxml2 as its basis for XML processing — has two main arguments for the serialization routines: obj and enc, where obj is the object to be serialized and enc is the XML encoding to be used, so you can serialize a valid object using pyxser.serialize(obj = my_object, enc = 'utf-8'). You can see the full pyxser documentation using the pydoc command and looking forward for the pyxser module.

To serialize Django models, you need to restrict some fields, so you need to filter them, you do not need to worry about processing each model field, you just need to worry to filter the model fields properly using the selector argument and the depth argument. Take a look on the following decorator.


import pyxser as px

def render_to_xml(**pyxser_args):
    def outer(f):
        @wraps(f)
        def inner_xml(request, *args, **kwargs):

            result = f(request, *args, **kwargs)
            r = HttpResponse(mimetype='text/xml')
            try:
                render = px.serialize(obj=result,
                                      enc='utf-8',
                                      **pyxser_args)
            except Exception, exc:
                render = ""
            if result:
                r.write(render)
            else:
                r.write("")
            return r
        return inner_xml
    return outer

If you apply the decorator above in a Django view, it will return the serialized object as text/xml to the HTTP client. So, your view must return a valid object to be serialized by pyxser. It applies the pyxser.serialize function to the given output from your view. Now take a look to a view which uses this decorator to throw XML.

def get_class(kls):
    try:
        parts = kls.split('.')
        module = ".".join(parts[:-1])
        m = __import__(module)
        for comp in parts[1:]:
            m = getattr(m, comp)
        return m
    except:
        return False

## use an URL as follows:
## (r'x/g/(?P[w.]+)/(?Pd+)/',
##  u'views_xml.get_model_object'),
@require_http_methods(["GET", "OPTIONS", "HEAD"])
@render_to_xml(selector=do_value_attrs, depth=2)
def get_model_object(request, model=None, oid=None):
    obj = object()
    try:
        db_model = get_class(model)
        obj = db_model.objects.get(pk=oid)
        return obj
    except Exception, exc:
        log.error(exc)
    return obj

The view above returns an object from the given model name model and the given primary key oid, and passes the do_value_attrs selector function as attribute selector to pyxser, and restrict the serialization depth to two levels. Remember that pyxser allows to serialize circular references and cross references between objects, so we need to restrict the serialization depth, in case of Django models we can work with 2 levels in almost all models and the field selector do_value_attrs can be defined as follows.


DENIED_FIELDS = ['user', 'customer', 'users', 'customers']
DENIED_CLASSES = ['Related', 'Foreign', 'Many']

def is_allowed_class(fld):
    for nm in DENIED_CLASSES:
        if nm in fld.__class__.__name__:
            return False
    for nm in DENIED_FIELDS:
        if nm in fld.name:
            return False
    return True

def do_value_attrs(o):
    values = dict()
    if hasattr(o, '_meta') and hasattr(o._meta, 'fields'):
        for fldn in o._meta.fields:
            if is_allowed_class(fldn):
                values[fldn.name] = getattr(o, fldn.name)
    else:
        for kw in o.__dict__:
            values[kw] = getattr(o, kw)
    return values

Where we are filtering all fields in model objects that we do not want to serialize and all field classes that pyxser should not serialize for plain object transmision. Other objects which are not model related objects are serialized as plain Python objects using their dictionaries to get the object attributes, and also DENIED_FIELDS are skipped and DENIED_CLASSES are skipped too. The resulting XML for URLs like /p/x/g/offer.models.Marca/1/ is as follows.



  Sony
  1
  sony

The pyxser serialization model holds type information, so any serialized object carries type information to be user in deserialization tasks, then you can handle the object back in any machine supporting pyxser and get the object deserialized to its original class using the unserialize function.

defining object containers

The pyxser extension cannot handle Django model containers directly, I mean those returned by the all method in query sets. So, you need to create a plain container to hold those objects that are retrieved from the database. Take a look on following view.

class Container(object):
    count = 0
    items = []
    def __init__(self):
        pass

def collect_filters(qd):
    data = qd.copy()
    filters = dict()
    for kw in data:
        if kw.startswith('filter__'):
            name = kw.replace('filter__', '')
            filters[name] = data[kw]
    return filters

### use an URL as follows:
### (r'x/l/(?P[w.]+)/(?Pd+)/',
###  u'views_xml.get_model_list'),
@require_http_methods(["GET", "OPTIONS", "HEAD"])
@render_to_xml(selector=select_value_attrs, depth=4)
def get_model_list(request, model=None, limit=1):
    container = Container()
    container.count = 0
    container.items = []
    try:
        db_model = get_class(model)
        filters = collect_filters(request.GET)
        objs = db_model.objects.filter(**filters).all()[0:limit]
        container.count = len(objs)
        container.items = map(lambda x: x, objs)
        return container
    except Exception, exc:
        log.error(exc)
    return container

If you take a look carefully to this example, you will notice that we are using a very simple Container class to hold your objects. The resulting XML for the URL /p/x/l/prod.models.Marca/5/?filter__nombre__contains=son is as follows.



  3
  
    
      Epson
      10
      epson
    
    
      Ericsson
      15
      ericsson

The resulting XML serialized object holds three Marca objects and all of them have their type to be deserialized once they are retrieved. If you want your objects to be deserialized back, you just need to use the pyxser.unserialize function properly, which is documented in the pyxser extension itself. I hope that you will like how pyxser works.

© Daniel Molina Wegener for coder . cl, 2011. | Permalink | One comment | Add to del.icio.us
Post tags:

Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0)