Wednesday, October 15, 2008

Hadoop

I'm now on my second project where we're using Hadoop/Hbase and the Google Web Toolkit, both of which I'm happy to get a chance to use at work. Gives me an excuse to play, er, work with tools I'd use at home anyway.

The first Hadoop project was for a client in the health industry. They needed to provide doctors with easy access to DICOM images using web browsers. This required a conversion process from DICOMs that browsers cannot display, in to browser-friendly formats such as JPG and AVI files. We used Hadoop to manage the conversion of large numbers of images from DICOM to JPG and to created varying sizes of images for thumbnails and such. We used Hbase to store the images. The use of Hbase on top of Hadoop greatly simplified the original approach where we stored the images on HDFS. Finally, we developed a snazzy GWT front-end for the doctors so that they could upload and manage images on the system.

The exposure to GWT was a great experience, and I was so impressed with it that I decided to use it for a web site I'm developing, PenWag. We use Agile-XP practices at work, and I try to follow those to the extent I can at home. So, yes, I have story cards, I have a continuously-demo-able product. PenWag is under development, go have a look at its current state. I'm just getting started, but thought I'd like to make it available all along the way.

The current Hadoop project is an R&D effort. Hadoop is relatively new to most companies. I'm starting to hear about opportunities to do Hadoop development on both coasts of the U.S. There are fewer such opportunities in the Midwest, but it's coming this way. Our company is preparing to be ready when it gets here. We already have some experience, but we're exploring the technology more thoroughly, because there's a lot we can do for our clients with this technology. Hadoop really, really simplifies the whole question of how to scale my app. Now, any problem I can express in MapReduce terms can be deployed to Hadoop. We can start with four or five commodity boxes, but could conceivably scale to 2,000 or 10,000 boxes if that's what the customer wanted. Assuming, you know, that they had a place for 10,000 Linux machines, and a way to cool them.

8 comments:

Sérgio Vasquez said...

Hello!

Do you think you could share the architecture of this healthcare project?

Thanks.

Sergio

shalish said...

Great. Out of interest, I am also in to a similar R$D. Can u please help me out.

Don Branson said...

Well, it's been five years, so the details may be a little fuzzy. Besides, whatever we did is five years' stale by now. If you still think it may be of some value, let's hook up via google+ and I can try to answer your questions.

Shalish VJ said...

Thanks a lot for the reply Don Branson. Great to here that what ever I am trying now had been tried and successfully executed by you 5 years back. Still I would need your guidance since I am a beginner. Will you be able to share your email id so that I can seek your advise. Thanks.

Don Branson said...

Can you ping me via google plus? I prefer not to post my email address.

Shalish VJ said...

Hi, I have pinged you in google+

sundara rami reddy said...

I browse and saw you website and I found it very interesting.Thank you for the good work on hadoop.Wonderful and informative web site.I used information from that site its great.greetings.
Hadoop Training in hyderabad

Don Branson said...

Thank you Sundara. Glad to help. :)