hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Retter" <Adam.Ret...@landmark.co.uk>
Subject Appropriate for Hadoop?
Date Tue, 28 Apr 2009 10:05:54 GMT

If I understand correctly - Hadoop forms a general purpose cluster on
which you can execute jobs?

We have a Java data processing application here that follows the
Producer -> Consumer pattern. It has been written with threading as a
concern from the start using java.util.concurrent.Callable.

At present the producer is a thread that retrieves a list of document
URI's from a SQL query against databaseA and adds them to a shared
(synchronised) queue.

Each consumer is a thread, of which there can be n, but we typically run
with 16 on the current hardware.
The consumer sits in a loop, processing the queue until it is empty. It
removes a document URI from the shared queue, retrieves the document and
performs a pipeline of transformations on the document, resulting in a
series of 600 to 16000 SQL insert statements which are then executed
against databaseB.

I have been reading about both Terracotta and Hadoop. Hadoop appears the
more general purpose solution that we could use for many applications,
however I am not sure how our application would map onto Hadoop
concepts. I have been studying the Map/Reduce Hadoop approach but our
application does not produce any intermediate files that would be the
input/output to the Map/Reduce processes.

Any guidance would be appreciated, it may well be that our application
is not an appropriate use of Hadoop?

Thanks Adam.
Adam Retter
Software Developer
Landmark Information Group
T: 01392 685403 (x5403) 
5-7 Abbey Court, Eagle Way, Sowton,
Exeter, Devon, EX2 7HY

Registered Office: 7 Abbey Court, Eagle Way, Sowton, Exeter, Devon, EX2 7HY
Registered Number 2892803 Registered in England and Wales 

This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 

The information contained in this e-mail is confidential and may be subject to 
legal privilege. If you are not the intended recipient, you must not use, copy, 
distribute or disclose the e-mail or any part of its contents or take any 
action in reliance on it. If you have received this e-mail in error, please 
e-mail the sender by replying to this message. All reasonable precautions have 
been taken to ensure no viruses are present in this e-mail. Landmark Information
Group Limited cannot accept responsibility for loss or damage arising from the 
use of this e-mail or attachments and recommend that you subject these to 
your virus checking procedures prior to use.

View raw message