Home

Latest Post

JSON v. XML: Is the new kid on the block really better?

The JSON data format is the (relatively) new kid on the block: It’s now becoming one of the most popular formats for data exchange, especially in the *nix world. Why the popularity? Well, just like XML, it’s human-readable. And just like XML, JSON’s hierarchical structure represents hierarchical data in a nice, easy-to-comprehend format. Just like XML, JSON also supports schemas for data validation.

So why the increase in popularity of JSON? I believe there are two perceptions behind this: Simplicity and verbosity:

Simplicity

JSON schemas only have a few datatypes: Double-precision floating number, string, boolean, array, object and null. That’s it! XML Schema Definitions (XSDs) on the other hand have a plethora of datatypes which can more tightly define the data to be stored: There’s a full range of integers and floating number types of different sizes, plus other types. Furthermore, XSDs allow you to specify the encoding for the entire document (more on why this is important later).

So with all these additional features available in XSDs, why choose JSON? I contend that it’s because JSON schemas have less functionality, they’re less intimidating and have a lower learning curve.

Verbosity

The basic JSON format for representing attributes and values is “AttributeName”: “Value”.

The basic XML format for representing attributes and values is commonly <AttributeName>”Value”</AttributeName> – a more verbose representation of the same data.

Note that I said “commonly”: That’s because XML values are commonly persisted as XML Nodes rather than XML Attributes. The real equivalent to the JSON Attribute/Value pair is AttributeName=”Value”, which is very slightly less verbose than the JSON representation.

http://en.wikipedia.org/wiki/Json#XML shows how the same data represented in element-delimited XML is much more verbose than the equivalent JSON representation, but also how attribute-delimited XML is slightly more compact.

So not only can XML be as compact as JSON, it can actually be more compact with the judicious use of attributes rather than elements for attribute/value storage.

Encoding

Now that we’ve discounted the myth that JSON is always less verbose than XML, I’m going to put forward one of the major advantages XML has over JSON: XML allows the use of single-byte document encoding.

The JSON definition specified in http://www.ietf.org/rfc/rfc4627.txt states that JSON documents are always in extended character sets, but can be encoded in UTF-8, UTF-16 or UTF-32 formats. While UTF-8 does a good job representing the ASCII characters with a single byte each, the other encodings do not, using either two or four bytes per character. Where the majority of data uses the ASCII character set, there will be little difference between the size of UTF-8 documents and single-byte encoded documents. But for languages not satisfied by the ASCII character set (in particular those languages that do not use latin characters), UTF-8 will not offer any advantage over UTF-16 and will result in documents double the size of those encoded using the apppropriate single-byte character set.

When data volumes are small or processing time is not of the essence, then making the optimal choice for encoding is not essential: But as data volumes get large or when processing time is important, it really makes sense to choose the optimal encoding, and it’s here that XML’s rich choice of encoding formats really scores: XML supports UTF-8 and a whole host of unicode encodings. But unlike JSON, it also supports a whole range of single-byte character encodings: The ISO-8859-1 single-byte character encoding covers most Western European languages and is sufficient for many uses in Europe as well as in the financial industry. There are also equivalents for non-Western European languages – so if you’re designing a system primarily for use in Arabic, Farsi, Hebrew, Russian, Urdu or any language for that matter that has its own single-byte code page, then XML’s single-byte encoding support will give you the most compact document. Furthermore, if you’re communicating with a legacy system that uses a single-byte code page (like EBCDIC), then XML would make a better choice.

Summary

While JSON may be a simple format, it certainly isn’t more compact than well-formed XML. Nor is it as suitable when custom encoding is required, or when schemas requiring the rich datatypes provided by XSDs need to be defined. However, it does seem that it is harder to generate a superfluously bloated document in JSON, unlike with XML where the mistake of storing all attributes as XML nodes is common practice.