Tuesday, June 29, 2010

Favorite Project Series - The Data Federation Service

The Data Federation Service (DFS) is a project that we did (in 2006-2007?) for a client as part of a much larger project. The goal of DFS was to provide a framework for distributing documents to users around the globe, many of whom would be on very slow and/or very unreliable networks. A document in this context is any kind of file that contains published information - it might be a text document, a spreadsheet, an image, a diagram, or anything else. To the DFS, they're just files.

The system had to run on hardware that was already largely in place. There was a primary node with failover at the highest layer, an intermediate layer, and layer of regional nodes distributed around the world. The regional nodes acted as connection points for mobile devices.

The goals of the system were to 1) provide a web-based façade for document authors to publish documents that might be of global or regional interest, 2) distribute global documents to all the regional servers and regional documents to a regional server plus an alternate server that would act as that server's backup, 3) provide redundancy alternate paths for document delivery in case of server failures in the middle layer, 4) a mechanism to deliver annotations to documents back to the author for vetting and addition to the document, 5) provide pluggable reachback to external data sources, 6) allow users of mobile devices to flag documents that they wanted pushed all the way to their mobile devices whenever updates are available, 7) store metadata about the documents that describe the document and the source, and can be used for filtering, 8) provide a complete list of available documents to all of the regional servers, so that mobile users could search for and retrieve documents that are targeted to other regions, and 9) provide command-line and service interfaces for programmatic interaction with the DFS.

The system required 99.9% availability for document publication. The mobile devices existed in an environment where network connections might be sporadically unavailable, or even unavailable for weeks before connectivity was restored. They needed to get whatever updates were available when the reconnected. Furthermore, the actual NICs as exposed on the mobile devices by the O/S to the application could come and go, and the application needed to be able to tell when a NIC became available. The application had to detect network availability and retrieve new documents.

The team varied over time, but included five (and more, at brief intervals) developers over the course of the project. Working on a distributed system like this creates some complexities that a lot of developers wouldn't necessarily consider. For example, we wanted to be able to delete documents. No problem, right? Well, what about annotations that might come in weeks or months after a document is deleted? The answer was to simply mark documents for deletion, and not physically delete them. This allows then, that an author can be notified of an annotations to a deleted document, and see if the document needs to be re-activated, or if the annotation provides information needs to be attached to another document. Fortunately, the folks on the team were able to spend the time that it takes to dig into it and understand the distributed ramifications of code changes and design decisions.

Some of the key technologies involved:

JmDNS - Used to detect server presence. This was where we learned that Java's ConcurrentHashMap can sometimes throw a ConcurrentModificationException, requiring us to handle it and recover from it, so that server availability was reliably detected.

Servlets - WebServices were considered and rejected because we needed to stream data through a series of nodes so that the latency hits would overlap and reduce overall delivery time from the primaries or external sources all the way to the regional servers and mobile devices. The required servlets were mostly hosted on Tomcat engines. I originally scoffed at the idea of running them under Jetty - but I had a wrong view of what Jetty could do for us, and we could have replaced all the Tomcats with Jetty.

JMS - document publication triggers multiple notification messages to the intermediate layer. A regional document for, say, the North America region, would trigger two notifications to the North America regional node and two to its backup node, which might be the Europe node. Two (or more) messages are triggered, and each goes through a different node in the intermediate layer, providing redundancy for the message delivery. If a node in the intermediate layer is down, delaying delivery of one of the messages, the alternate message can still go through.

One of the key attributes of the DFS was the ability to live in a highly unreliable network environment. Connections could come and go, which means that we needed a JMS provider that would reliably reconnect. We started initially with ActiveMQ. It's easy to configure via Spring, and has some nice automatic features that reduced the amount of manual configuration needed. During endurance testing where I ran a series of integration tests over a period of about two days, I found ActiveMQ would reconnect after a server failure/recovery between 60-70% of the time. We tried a number of different solutions, but nothing worked to raise that percentage. I tried Joram as an alternative and found that it was well-suited to behave reliably on an unreliable network, and would reconnect 100% of the time. We did have problems with Joram's distributed JNDI at the time (I think that's since been resolved), so I used local JNDI and wrote a bridge to move messages between nodes, meaning that the rest of the DFS only had to perform local JNDI queries. Joram stood up well in short tests of a couple days and in longer endurance tests of up to two or three weeks.

The architecture relies on the guaranteed delivery that JMS provides to ensure that each node will receive the message if connectivity is available. If a node is down, it will receive the message when it comes up. Each message provides the document's MD5 and server location for the identification and retrieval of the document. When a node receives a message, it turns around and attempts to fetch the document from the originating server. If retrieval fails, the message remains on the JMS queue and an attempt is made later. When the alternate message comes through, the regional node can determine from the MD5 that it's already pulled the document, and can discard the second message. This is the basic reason for using notification messages instead of document delivery messages - to avoid the unnecessary duplication of the document on the wire.

Each layer is like the next. That is, there's nothing special about how one layer retrieves documents from the next layer up. So, additional layers can be added if desired. Mobile users have a complete directory, which means they can request documents that are not on there regional server, and therefore have to be fetched from the primary server or from another regional server. Fetching from other regional servers is tried first to distribute the load.

A portlet provided the front-end for authors. It was, unfortunately, kind of clunky, but served as a starting point where a better wrapper could be written over the underlying framework. The only thing really interesting here is that we discovered that the way we used the file upload utility caused three copies of documents to be held in memory during the upload process. Yuck.

As it turned out, this solution ended up sitting on a shelf. Why? Because the three or so applications that were targeted to use this framework for document distribution were delayed and eventually cancelled.

After that, we began to look for opportunities to apply this architecture elsewhere. Given that it's a lot easier to sell something that has a GUI than it is for something that's just a bunch of wiring with a clunky portlet, I took an OpenMap earthquake application that I wrote at home and modified it to fetch its earthquake section data from the DFS. The application displays a world map, pulls NEIC data that we published to the DFS, and displays the earthquake events on the map.

Why is this a favorite project, despite that fact that it's now collecting dust? Because of the extra challenges presented when designing applications that are distributed. It's an extra dimension of complexity above and beyond what a basic web app or gui app requires, and that makes a project like this quite a bit of fun. Another reason is the people on the team. We had a small, competent group that was dedicated to code quality and more importantly, to product quality.

No comments: