The Architect

A Technical Architects Thoughts

March 1st, 2010

For those of you who don’t now what Nirvana is, Nirvana is a a low latency platform (middleware).  It is a middleware technology that securely and quickly streams real-time data between server and client/s.

Nirvana supports three messaging paradigms:

Publish/Subscribe

Publish/Subscribe is an asynchronous messaging model where the sender (publisher) of a message and the consumer (subscriber) of a message are decoupled, the two share a common topic or channel. The publisher publishes data to the channel. The channel exists with the Nirvana realm server. As messages arrives on a channel the server automatically sends them onto all consumers subscribed to the channel. Nirvana supports multiple publishers and subscribers on a single channel.

Message Queue

Like Pub/Sub Message Queues decouple the publisher or sender of data from the consumer of data. The Nirvana realm server manages the fan out of messages to consumers. Unlike Pub/Sub only one consumber can read a message from a queue. If more than one consumer is subscribed to a queue then the messages are distributed in a round-robin fashion.

Peer to Peer

Peer to Peer provides a direct communications path between an instance of a service and the client requiring access to the service. The Nirvana realm server brokers the relationship between the service and the client and in doing so becomes transparent as messages pass through it.

Flex Native Interface

The Nirvana Native Communication Protocol will soon be available for Flex, removing the overhead of composing and decomposing data from some kind of String representation.  The advantage is that now Byte Arrays will be transported (much leaner and therefore faster), which enables developers to take advantage of frameworks such as Googles Protocol buffers.  Microsoft Silverlight has always had the native implementation, but soon Adobe Flex will compete on the same level.  A cut down version of the Flex Native interface should be available by the end of Q2 2010.

March 1st, 2010

I have been looking into middleware solutions as a push mechanism between server and client.  One of the aspects that I had to consider was latency.  Hence it was important to find a lean technology agnostic transport format.  Most server-client platforms use a serialization technique to serialize into a leaner data format, and then de-serialize on the receiving end.  So in my search for a technology I now had to consider speed of serialization.

Many languages offer native serialization APIs, but when serializing the data using the native API, Metadata about the class is serialized into the output too.  I needed to find a technology that would serialize only the data values and not the additional Metadata about the object serialized.

I also needed to identify the best data format to serialize to.  XML (SOAP), strings and data dictionaries are common data formats, but a Byte Array is far more efficient, and is the proper serialization format when dealing with a client and server platform that are built on the same technology.

I came across two technologies ‘Google Protocol Buffers’ and ‘Apache Avro’:

Google Protocol Buffers

Protocol Buffers is a serialisation format with an interface description language developed by Google.  It is available under free software, open source license.  Protocol Buffers design goals are emphasized performance and simplicity.  It is a language and platform neutral technology that is an extensible mechanism for serializing structured data.

It works by you defining how you want your data to be structured via proto files, which are simply structure text files.  Once you have decided the structure in your proto file, the proto executable is called on it, and a generated class (Adobe Actionscript 3, Java, C, C++, Python) is produced.  The class can be generated into multiple different technologies, which means the class can be generated for the client and server technologies.  Thus securing a data contract (which is type safe) between the two.  The protocol buffer technology provides the ability to update the data structure without breaking deployed programs that are compiled against the old format.

Protocol buffers claims it takes between 100 to 200 nanoseconds to parse.   As the overhead of the data structure is not needed in protocol buffers, only the object fields’ values is serialized.  Protocol buffers will find the most compact serialisation technique for a particular data type (always primitives), and only serialize fields that are not null.

Apache Avro

Avro is another very recent serialisation system.  It provides rich data structures that are compact, and are transported in a binary data format.

Avro relies on a schema-based system that defines a data contract to be exchanged.  When Avro data is read, the schema used when writing it is always present.  Similar to Protocol Buffers, it is only the values in the data structure that are serialized and sent.  The strategy employed by Avro (and Protocol Buffers), means that a minimal amount of data is generated, enabling fast transport.

The schemas are equivalent to protocol buffers proto files, but they do not have to be generated.  The JSON format is used to declare the data structures.

Results

I ran a few benchmark tests and concluded the following: The distinction to be made between the two comes down to implementation, extensibility and compatibility.

Implementation: Protocol Buffers was a much cleaner implementation than Avro.  Avro was messy with limited availability of online resources.  Avro uses a JSON object in string form to represent a schema. Defining an Avro schema is cumbersome and difficult to maintain; as well as increasing the risk of runtime errors when the structure wasn’t quite right.  The contract is not type safe, and it becomes very easy to set values against object fields of the wrong type.  Such errors can only caught at runtime, rather than compile time.

Google’s Protocol buffer does not have such complexities.  Protocol Buffers prompts the coder as soon as an error is reported through the protocol buffer compiler.  Protocol Buffers allows null able fields (something that Avro doesn’t), which means that when protocol buffers is serializing, it will ignore fields that are null, and thus reduce the overhead of serializing irrelevant data (unlike Avro).

Winner – Google’s Protocol Buffers

Extensibility: Google’s Protocol buffer provides a much richer API for defining a data contract than Avro. Below is a list of features available to Protocol Buffers and not Avro:

  1. Declare nested types
  2. Define requires, repeated and optional fields
  3. Specify default values on fields
  4. Declare enumerations and set a fields default value from it
  5. Multiple message types in the same document
  6. Import other proto files
  7. Declare a range of field numbers in a message available for third party extensions (Extensions)
  8. Nested Extensions
  9. Define services

Winner – Google’s Protocol Buffers

Compatibility: Avro is only compatible with C, Java and Python, and hence restricts client technology candidate options, although they do plan for other technology languages.

Protocol Buffers is compatible with C, C++, Adobe Actionscript 3, Java and Python.  As there is a C++ version is available, Microsoft Silverlight and WPF is therefore compatible with Google’s Protocol Buffers, but there are projects to port a Protocol Buffer compiler to C# and other technologies.

Winner – Google’s Protocol Buffers