A Coder's Log

Tuesday, June 29, 2010

Favorite Project Series - The Data Federation Service

The Data Federation Service (DFS) is a project that we did (in 2006-2007?) for a client as part of a much larger project. The goal of DFS was to provide a framework for distributing documents to users around the globe, many of whom would be on very slow and/or very unreliable networks. A document in this context is any kind of file that contains published information - it might be a text document, a spreadsheet, an image, a diagram, or anything else. To the DFS, they're just files.

The system had to run on hardware that was already largely in place. There was a primary node with failover at the highest layer, an intermediate layer, and layer of regional nodes distributed around the world. The regional nodes acted as connection points for mobile devices.

The goals of the system were to 1) provide a web-based façade for document authors to publish documents that might be of global or regional interest, 2) distribute global documents to all the regional servers and regional documents to a regional server plus an alternate server that would act as that server's backup, 3) provide redundancy alternate paths for document delivery in case of server failures in the middle layer, 4) a mechanism to deliver annotations to documents back to the author for vetting and addition to the document, 5) provide pluggable reachback to external data sources, 6) allow users of mobile devices to flag documents that they wanted pushed all the way to their mobile devices whenever updates are available, 7) store metadata about the documents that describe the document and the source, and can be used for filtering, 8) provide a complete list of available documents to all of the regional servers, so that mobile users could search for and retrieve documents that are targeted to other regions, and 9) provide command-line and service interfaces for programmatic interaction with the DFS.

The system required 99.9% availability for document publication. The mobile devices existed in an environment where network connections might be sporadically unavailable, or even unavailable for weeks before connectivity was restored. They needed to get whatever updates were available when the reconnected. Furthermore, the actual NICs as exposed on the mobile devices by the O/S to the application could come and go, and the application needed to be able to tell when a NIC became available. The application had to detect network availability and retrieve new documents.

The team varied over time, but included five (and more, at brief intervals) developers over the course of the project. Working on a distributed system like this creates some complexities that a lot of developers wouldn't necessarily consider. For example, we wanted to be able to delete documents. No problem, right? Well, what about annotations that might come in weeks or months after a document is deleted? The answer was to simply mark documents for deletion, and not physically delete them. This allows then, that an author can be notified of an annotations to a deleted document, and see if the document needs to be re-activated, or if the annotation provides information needs to be attached to another document. Fortunately, the folks on the team were able to spend the time that it takes to dig into it and understand the distributed ramifications of code changes and design decisions.

Some of the key technologies involved:

JmDNS - Used to detect server presence. This was where we learned that Java's ConcurrentHashMap can sometimes throw a ConcurrentModificationException, requiring us to handle it and recover from it, so that server availability was reliably detected.

Servlets - WebServices were considered and rejected because we needed to stream data through a series of nodes so that the latency hits would overlap and reduce overall delivery time from the primaries or external sources all the way to the regional servers and mobile devices. The required servlets were mostly hosted on Tomcat engines. I originally scoffed at the idea of running them under Jetty - but I had a wrong view of what Jetty could do for us, and we could have replaced all the Tomcats with Jetty.

JMS - document publication triggers multiple notification messages to the intermediate layer. A regional document for, say, the North America region, would trigger two notifications to the North America regional node and two to its backup node, which might be the Europe node. Two (or more) messages are triggered, and each goes through a different node in the intermediate layer, providing redundancy for the message delivery. If a node in the intermediate layer is down, delaying delivery of one of the messages, the alternate message can still go through.

One of the key attributes of the DFS was the ability to live in a highly unreliable network environment. Connections could come and go, which means that we needed a JMS provider that would reliably reconnect. We started initially with ActiveMQ. It's easy to configure via Spring, and has some nice automatic features that reduced the amount of manual configuration needed. During endurance testing where I ran a series of integration tests over a period of about two days, I found ActiveMQ would reconnect after a server failure/recovery between 60-70% of the time. We tried a number of different solutions, but nothing worked to raise that percentage. I tried Joram as an alternative and found that it was well-suited to behave reliably on an unreliable network, and would reconnect 100% of the time. We did have problems with Joram's distributed JNDI at the time (I think that's since been resolved), so I used local JNDI and wrote a bridge to move messages between nodes, meaning that the rest of the DFS only had to perform local JNDI queries. Joram stood up well in short tests of a couple days and in longer endurance tests of up to two or three weeks.

The architecture relies on the guaranteed delivery that JMS provides to ensure that each node will receive the message if connectivity is available. If a node is down, it will receive the message when it comes up. Each message provides the document's MD5 and server location for the identification and retrieval of the document. When a node receives a message, it turns around and attempts to fetch the document from the originating server. If retrieval fails, the message remains on the JMS queue and an attempt is made later. When the alternate message comes through, the regional node can determine from the MD5 that it's already pulled the document, and can discard the second message. This is the basic reason for using notification messages instead of document delivery messages - to avoid the unnecessary duplication of the document on the wire.

Each layer is like the next. That is, there's nothing special about how one layer retrieves documents from the next layer up. So, additional layers can be added if desired. Mobile users have a complete directory, which means they can request documents that are not on there regional server, and therefore have to be fetched from the primary server or from another regional server. Fetching from other regional servers is tried first to distribute the load.

A portlet provided the front-end for authors. It was, unfortunately, kind of clunky, but served as a starting point where a better wrapper could be written over the underlying framework. The only thing really interesting here is that we discovered that the way we used the file upload utility caused three copies of documents to be held in memory during the upload process. Yuck.

As it turned out, this solution ended up sitting on a shelf. Why? Because the three or so applications that were targeted to use this framework for document distribution were delayed and eventually cancelled.

After that, we began to look for opportunities to apply this architecture elsewhere. Given that it's a lot easier to sell something that has a GUI than it is for something that's just a bunch of wiring with a clunky portlet, I took an OpenMap earthquake application that I wrote at home and modified it to fetch its earthquake section data from the DFS. The application displays a world map, pulls NEIC data that we published to the DFS, and displays the earthquake events on the map.

Why is this a favorite project, despite that fact that it's now collecting dust? Because of the extra challenges presented when designing applications that are distributed. It's an extra dimension of complexity above and beyond what a basic web app or gui app requires, and that makes a project like this quite a bit of fun. Another reason is the people on the team. We had a small, competent group that was dedicated to code quality and more importantly, to product quality.

Saturday, January 30, 2010

Weblogic JMS, The PointBase 30MB Limit, And Switching To MySQL

Overview

I've been running into this problem lately that occurs when dumping messages into Weblogic JMS queues on a developer's workstation. The underlying problem is that the PointBase database provided with Weblogic has a hard-and-fast size limit of 30MB. The last thing the company I work for wants to spend money on is database licenses on developers' boxes, especially when there are plenty of free options available.

I'm pretty comfortable with MySQL, and always have it installed on any developer box I'm using. But, Weblogic supports a long list of alternative DB types, so pick one you like: Adabase, Cloudscape, DB2, Derby, EnterpriseDB, FirstSQl, MS/DB, Informix, Ingres, MS SQL, MaxDB, Oracle, PostgresSQL, Progress, and Sybase.

My original goal was to rip out PointBase entirely, and use MySQL exclusively. There may be a way to do that, but it seems that it's a matter of going through and replacing each configured PointBase datasource with a MySQL datasource, then switching over. In the meantime, I just replaced the one datasource I needed to store JMS messages. Assuming you already have Weblogic and your database of choice installed, it boils down to about four steps: Creating the database, creating the datasource, creating the JDBC store, and creating a JMS JDBC store. Most of the info is from links I hunted down and pulled together to make this list, so links back to the original docs are included.

Creating The Database

Okay, this one's really hard. ;) Here's the MySQL command:

mysql -uroot -e 'create database wls'

I just picked the DB name. We'll use it later. The name doesn't matter, just be consistent.

Creating The JDBC Data Source

Navigate Services->JDBC->DataSources->New. The name and JDNI name don't matter, again, be consistent.

Name = mysql-wls
JNDI name = jdbc/mysql/wls
Database type = mysql
Driver = com.mysql.jdbc.Driver

Click Next. I took the defaults:

Supports Global checked
One-phase commit

Click Next.

Database name = wls
Host = localhost
User = root
Password = <your root's pswd>

Services->JDBC->DataSources->mysql-wls->Targets

Servers=examplesServer

Creating The JDBC Store

http://download.oracle.com/docs/cd/E12840_01/wls/docs103/ConsoleHelp/taskhelp/stores/CreateJDBCStores.html

Services->Persistent Stores->New->Create JDBC Store

Name=mysql-jms
Target=examplesServer
Datasource=mysql-wls
Prefix=jms_

Creating The JMS JDBC Store

http://download.oracle.com/docs/cd/E12840_01/wls/docs103/config_wls/store.html#wp1142690

Navigate to Services->Messaging->JMS Servers->examplesJMSServer

Persistent Store->mysql-jms

As a final thought, you may want to make your Queues persistent. ;) There's not much point setting this up if you're keeping your messages in memory:

Navigate to Services->Messaging->JMS Modules-><module>-><queue> ->Overrides,Delivery Mode Override=Persistent

Saturday, December 19, 2009

Adsense Ads and GWT - Making it work.

It seems like a lot of people have had this same problem, but I haven't found anywhere on the net where someone has found a solution. Here's how I got it to work. If you find this post helpful, I would ask the favor that you check out http://penwag.com, and ask your friends to do the same.

I struggled for a long time trying to get an Adsense ad to appear in a <div>, but that seems to be the wrong approach. Divs are nice for styling reasons, but it seems that Adsense knows when it's in a div, and won't display.

I avoided IFrames (which is what the GWT Frame object compiles to) because sizing isn't automatic. Eventually, though, it became apparent that IFrames were the way to go, since Adsense ads will load in them. I create an IFrame and point it at a static page that contains the necessary Adsense script. That just works. The content loads correctly, and ads will display.

But here's the rub. IFrames need to be sized with custom javascript. I use this javascript: https://penwag.com/home/iframe.js. This works easily for all browsers except - you guessed it - IE. Below is the GWT code that I use to bring it all together, including a work-around for IE.


    public static native String getUserAgent() /*-{
        return navigator.userAgent.toLowerCase();
    }-*/;

    private Widget buildMainPanel() {
        Widget mainPanel;
        if(getUserAgent().contains("msie")) {
            mainPanel = buildIEPanel();
        } else {
            mainPanel = buildNonIEPanel();
        }

        mainPanel.getElement().setId(getPanelId());
        mainPanel.addStyleName(Styles.StaticPanel);

        return mainPanel;
    }

    private Widget buildNonIEPanel() {
        Frame mainPanel = new Frame();
        mainPanel.getElement().setAttribute("onLoad", "resizeCaller();");
        mainPanel.setUrl(getRootPage());

        return mainPanel;
    }

    private Panel buildIEPanel() {
        Panel mainPanel = new VerticalPanel();

        HTML adBar = new HTML("Loading...");
        mainPanel.add(adBar);

        try {
            RequestBuilder builder = new RequestBuilder(RequestBuilder.GET, GWT.getModuleBaseURL() + "ie_ads/index.html");
            builder.sendRequest(null, new RequestHandler(adBar, null));
        } catch (RequestException e) {
            adBar.setHTML("");
        }

        HTML content = new HTML("Loading...");
        mainPanel.add(content);

        String errorMessage = "Failed to load content, please try again later.";
        try {
            RequestBuilder builder = new RequestBuilder(RequestBuilder.GET, GWT.getModuleBaseURL() + getRootPage());
            builder.sendRequest(null, new RequestHandler(content, errorMessage));
        } catch (RequestException e) {
            content.setHTML(errorMessage);
        }

        return mainPanel;
    }

Saturday, October 24, 2009

A New Star

I'd like to introduce you all to an up-and-coming star in the universe of photography. Also, he's my son, Taylor. I may be biased.

Please take a moment to visit his web site PhotoImageOgraphy. I think you'll be glad you did.

Sunday, September 13, 2009

Ozark Trail Volunteer Work

Well, after a couple of years using the Ozark Trail, I finally got around to helping to build the Ozark Trail on the Courtois Section. (It's pronounced Code-Away.) What a tremendous experience. The people were great to work with, and great to hang out with after the work was done for the day. And the food, thanks to Jeff The Chef, really hit the spot. Nothing like a couple burgers and a few brats to make the day complete.

The group included some tremendously hard workers on the rock-wall team, guys I couldn't even begin to keep up with energy-wise, including Scotty from the USFS, Russ, Gabe, Charles and others. Good work, guys.

No injuries, either, which is good. Couple close calls, though - the hill was about a 25% grade, and a few of the boulders got away from us.

What a great bunch of folks to hang out with. I'm looking forward to the next event.

Tuesday, July 14, 2009

The Value Of Pictures In Software Design

There are some very good reasons why software engineers use visual communication to quickly and effectively transfer knowledge from one person to another.

While people have many different learning styles, and while everyone employs all of the styles to a greater or lesser degree, most people, or at least, enough people to matter, are predominantly visual learners. Various sources claim that around 60% of us are visual learners. Therefore, it's worthwhile to use visual techniques for this reason.

Visual communication transfers information at a very high rate compared with aural and textual communication. You can tell with a glance a system's structure, or lack thereof. A verbal description takes longer.

Visual communication helps the sender, too. That is, the person creating the graphical representation has to understand the system well enough to draw it. This applies to verbal and written communication as well, so visual does not necessarily hold an advantage over other forms, but it's certainly a valid approach.

Visual communication helps a newcomer to the team come up to speed and become productive more rapidly.

Furthermore, graphical representations of software systems can reveal flaws, voids, and redundancies that are not immediately obvious in verbal or written communication. How many times have you drawn a system diagram, only to see that there's something that can be cleaned up? If you have not done this, try it - it's a worthwhile exercise.

To illustrate the value of pictures, let me point you at one that someone else drew, one that helped me to quickly understand a software framework's design and intended usage. I'm talking about the design of Mina, Apache's ongoing effort to wrap the basic Java NIO components. I think the graphics they provide are a great example of what we should be doing on our own projects.

Why is it important to understand the need for visual communication in software development? I've noticed something unsettling on the last couple projects I've been on: no graphical representations of the systems. In each case, there was no perceived need or gain to having images. For those of us who have been on non-trivial projects and witnessed the indispensable benefits of this form of communication, this is a red flag.

This red flag is a reliable indicator of a project in trouble. The particular dysfunctions might take one of several forms, but most likely is a combination of them. I'll try to name a few here, without trying to make an exhaustive list.

First, it means that the system probably has no clear direction. The team doesn't know where it's headed, or at least doesn't have a common vision of the goal.A shiny new a idea comes along, it's legitimately cool, and we go down that track. And that's great - provided it complements the existing framework. We can't follow all of those cool ideas. Some we'll have to back-burner for another day or another project. A solid understanding of a system as a whole, bolstered by few good drawings, can help us stay disciplined and on the road toward our goal. The images help to remind us of the goal, and to keep us from working at cross-purposes.

A lack of graphical communication might indicate that the team is unable to create them. The system has grown disorganized and chaotic over time (that is separation of concerns is largely gone), and the team, however good they are, cannot walk up to a whiteboard or show on paper a cohesive, overall design. The discipline of striving to achieve the goal of always having something drawn (simple, not 200 pages), drives us to keep the system cohesive and well-organized.

A lack of drawing might indicate something far worse - that the team refuses to draw them. This might be from fear that the drawings will become stale (a legitimate, but addressable concern), or because it takes time from coding, or from a misapplied development philosophy, from plain and simple laziness, or because of a lack of experience building complex systems. Even with current development paradigms that eschew grotesquely large architectural documents, some documentation is essential.

It's this last statement that seems be key element on the last couple projects I've been on. Agile development philosophies encourage us to limit the amount of useless documentation that gets created. This is a worthy and noble goal. Sadly, some have twisted the intent of these goals, eliminating strong, time-tested tools from their arsenal, to the detriment of the projects and teams they represent.

Thursday, July 9, 2009

John Roth, founder of the Ozark Trail Association, has died

This is sad news, indeed. I only ever "met" John via email and discussion forums, but never had the privilege of meeting him in person. As I read the notes from the forum linked below, it's clear what a void his passing leaves.

Ozark Trail home page
Forum notes