web developer & system programmer

coder . cl

ramblings and thoughts on programming...


pyxser serialization model

published: 06-08-2009 / updated: 06-08-2009
posted in: programming, projects, python, pyxser
by Daniel Molina Wegener

pyxser is a Python-Object to XML serializer and deserializer. The common serialization model that follows most object-to-xml serialization implementations, are doing it in the same way. Most of them — possibly all of them — are doing the same task: they create a custom XML element (a custom tag) for every object. By custom XML element, I understand an element named MyObject for every object (class instance) of the type MyObject, without considering its namespace and other proper attributes that a well structured serialization needs.

For example, if we have the next class in C#:

public class Movie
{
    public string Title
    { get; set; }

    public int Rating
    { get; set; }

    public DateTime ReleaseDate
    { get; set; }
}

An object of that type should be serialized into the next XML document:

<?xml version="1.0" encoding="utf-8"?>
<Movie xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <Title>Starship Troopers</Title>
  <Rating>6.9</Rating>
  <ReleaseDate>1997-11-07T00:00:00</ReleaseDate>
</Movie>

I’ve created a simple XML model defined in Document Type Definition specification and the proper XML Schema specification as the pyxser basis for object serialization. Also, it could be an object model — but isn’t wide extended to that purpose — and it’s just enough to allow the proper representation of serialized objects, with all the expressiveness that XML has, with some basis on the paper titled "Open Reusable Object Models" by Ian Piumarta and Alessandro Warth at Viewpoint Research Institute.

The model consist of three primary XML elements: obj, col and prop. Both obj and col elements are recursive, allowing both obj and col sub-elements. The document root is declared as the obj element. The model do not import any kind of entity set, this means that some known entities — such as iso-8859-1 bindings — are not imported and you can not use, for example, an &oacute; entity. Both, the XML Schema and the DTD (Document Type Definition) can be imported/included in your own model — as the standard XML allows. In favor of processing friendly descriptions, it has defined most of their attributes as xs:NMTOKEN. I know that this one is a restriction to use symbols and localized characters in packages, class names and attribute names, but by using those things, the code itself is not portable.

Most constructs are defined as compositions, instead of associations — that in my opinion are not an important part of well designed systems and obviously minded for entity-relationship tasks — or aggregation, that aren’t parts of a decorated class, and, in my opinion are designed to hold collections.

pyxser-model
pyxser uml object model

You may ask yourself "why the multiplicity of parents objects is mapped to 1..n?". The reason is simple: the model allows cross-referencing and circular-referencing objects. And also the algorithm supports it, and in the real world, you can find one object in different locations of the same object tree by having different references on it.

why composition?

Composite aggregation is a strong form of aggregation, which requires that a part instance be included in at most one composite at a time and that the composite object has sole responsibility for the disposition of its parts. The multiplicity of the aggregate end may not exceed one (it is unshared). See Section 5.43, "Association End," on page 231 for further details. The composite in a composition "projects" its identity onto the parts in the relationship. In other words, each part object in an object model can be identified with a unique composite object. It keeps its own identity as its primary identity. The point is that it can also be identified as being part of a unique composite. Composition is transitive. If object A is part of object B that is part of object C, then object A is also part of object C. A part may be identified with several composite objects in this way, each at a different level of detail.

Unified Modeling Language Specification, ISO/IEC 19501.

The reason seems to be obvious. Referenced or not, most real world objects — not those DTO that some people are using as RDMBS interface — are composite objects, and most of them have a certain depth. The representation with more than one object in the multiplicity is just a representation of the recursive structure that the model have, hard to separate or represent as association that is out of context on real UML modeling, and is not a simple aggregation, since most of those aggregations represents concrete attributes of an object. Also, the object roles are not necessarily representing the object role itself, instead they represent the xml node role and the same applies to the composite objects multiplicity.

why a 0..n child’s?

XML since it’s conception — and inherited from SGML — have two interesting kinds of attributes: ID attributes, to identify an element and IDREF attributes, to build element references. I’ve used both types to represent object references. In the case of an already serialized object in the object tree — the serializer finds the same object reference — an empty element is created and it holds the objid and objref attributes, the first pointing to the object identifier itself and the second pointing to the referenced element. Yeah!, it holds duplicate objects, do not re-create all the elements that it holds on the real object tree! I really don’t know if current serialization implementations are supporting cross-references and circular-references — I’m afraid of experimenting with it, and a crash is too much precious time that I don’t like to lose.

conclusion.

pyxser is a Python-Object to XML serializer that is made with an O(n) algorithm — possibly an O(log n) algorithm since it doesn’t repeat the steps of serializing objects many times as they appear, but it has hidden costs that I’ve don’t have measured yet — and it’s supporting cross-references and circular-references.

feedback request

  • should I implement a JSON serializer using this algorithm and model?
  • should I implement a YAML serializer using this algorithm and model?
  • what do you think about this serialization model?
  • does this model impact networking protocols by setting a large size for serialized objects?


one comment to “pyxser serialization model”

  1. [...] presentation was focused on the pyxser serialization model and the proper usage of pyxser to store configurations and WebServices. Among other serialization [...]

post a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>