Saturday, June 13, 2009

Seeking Input For My Next Opensource Project

Friends,

I'm seeking your input to help me think about my next opensource project. I have two ideas, either one of which I'd like to do, probably using Java as the primary language, just as a matter of preference. I'd particularly like to know whether either is already being done, so that I don't duplicate work, and whether or not you think it might be something useful in work that you've done.

The first is a SOA Directory Service alternative to UDDI. When I worked to help implement a middleware/SOA framework in the mid-90s, one of the pieces we built was a directory service. While it didn't offer the metadata storage capability that UDDI offers today, it had some advantages over UDDI. It was simple: easy to register services, do lookups, etc. It was very fast. Services registered using a lease mechanism, so you could get a list of matching service instances, knowing that the instances were probably still up. Next, it was replicated. Certainly, many UDDI implementations support replication. What I don't want to do is create another UDDI implementation, but rather to build an alternative Directory Service that is more like what we did in the 90s, consistent with today's framework needs, but more lightweight. To my mind, UDDI is far more heavyweight a solutions than most enterprises need, and a simpler solution might offer some appeal, provided that it integrated well with whatever framework they're already using. That is, that it would be easy to choose it as an alternative to UDDI.

The second possibility is to do an opensource implementation of a data federation system. We built one for a client that was never used, but there were some good ideas in there. I'd like to do it again as an opensource project, because it offers some useful capabilities. It essentially allows users to publish documents to a master node, then replicate documents to regional servers, that is, to push the data close to where it would be used within the organization. For example, if a document were flagged as pertinent to an organization's European region, it would be be pushed to that region's server, and to its backup server in a neighboring region. Users in the region can then make annotations to the documents as needed, and push those back to the original author for consideration. A federation system such as this offers some availability and performance benefits relative to having a monolithic document server. When a user wants a document that not stored in his or her region, the system goes back to the master or another regional server to fetch it. As an added capability, the original system supported plugins that could fetch data from external sources, and that might be useful to include.

3 comments:

Heath said...

Couldn't the first idea be implemented fairly simply either with just LDAP or using DNS records or something? I don't know very much about UDDI, but I do know that LDAP and DNS are both very good as replicated directories.

Second idea sounds very cool. Are you planning on using Hadoop or something?

Don Branson said...

LDAP and DNS are useful approaches that are in use. There are a couple additional features that I'd like, including the lease-based dynamic registration, so that clients can get a reasonably current list of active services. Also, the original DS would return the list of services in order by location, such that the geographically nearest service would be first in the list, and so forth, so that clients would first try the closest service, which help with distribution of load and with response time.

Don Branson said...

In response to the second question you asked - the original was based on model-2 servlets running under Tomcat with Joram JMS for notification of updates. It was lean, scalable and reliable. Someone else suggested that we should use web services instead, not understanding that that takes away the ability to tunnel requests through intermediate layers, adds unnecessary overhead to each request, and imposes a memory cap on the scalability of the solution.

Hadoop is good for a different kind of scalability, that is, large data sets, and is geared more towards batch processing than federation. Hadoop's an interesting thought, but I think that it's a bit of a mismatch for the problem in the batch vs. realtime sense, and also in that there's a client-side piece that also needs the notifications and the transport layer, which I forgot to mention.