hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Retter" <Adam.Ret...@landmark.co.uk>
Subject RE: Appropriate for Hadoop?
Date Tue, 28 Apr 2009 13:15:15 GMT

> Each document processing is independent and can be processed
> parallelly, so that part could be done in a map reduce job.
> Now whether it suits this use case depends on rate at which new
> URI's are discovered for processing and acceptable delay in processing
> of a document. The way I see it you can batch the URI's
> and input that to mapreduce job. Each mapper can work on sublist of
> You can choose to make DB inserts from mapper itself. In that case
> you can set no of reducers to 0. Otherwise if batching of the queries
> is an option then you can consider making batch inserts in reducer. It
> will help in reducing load on DB.

So I don't have to use HDFS at all when using Hadoop?

Registered Office: 7 Abbey Court, Eagle Way, Sowton, Exeter, Devon, EX2 7HY
Registered Number 2892803 Registered in England and Wales 

This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 

The information contained in this e-mail is confidential and may be subject to 
legal privilege. If you are not the intended recipient, you must not use, copy, 
distribute or disclose the e-mail or any part of its contents or take any 
action in reliance on it. If you have received this e-mail in error, please 
e-mail the sender by replying to this message. All reasonable precautions have 
been taken to ensure no viruses are present in this e-mail. Landmark Information
Group Limited cannot accept responsibility for loss or damage arising from the 
use of this e-mail or attachments and recommend that you subject these to 
your virus checking procedures prior to use.

View raw message