hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stuh...@webmail.us>
Subject RE: Hadoop for incremental index building and search
Date Sat, 27 Oct 2007 18:52:08 GMT
Your understanding is correct.

The MapReduce paradigm is designed for processing large batches of data in parallel. Additionally,
startup time in Hadoop's implementation is rather costly. For jobs of a minute or less, more
than 50% of the time spent will be occupied by startup.

If you have a dataset that you can afford to index in batches, then Hadoop is an excellent
solution (as evidenced by Nutch).


-----Original Message-----
From: "Paul H." <pitt_dynamics@yahoo.com>
Sent: Saturday, October 27, 2007 12:19pm
To: hadoop-user@lucene.apache.org
Subject: Hadoop for incremental index building and search


According to a hadoop tutorial  (http://wiki.apache.org/nutch/NutchHadoopTutorial) on wiki,

"you don't want to search using DFS, you want to search using local filesystems.
Once the index has been created on the DFS you can use the hadoop
copyToLocal command to move it to the local file system as such".

my understanding is that hadoop is only good for batch index building,
and is not proper for incremental index building and search. Is this
true? By "incremental index building and search", I mean a system that
accepts text on the fly, builds index  and makes the index available
for search immediately. 


Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

View raw message