hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From roger dimitri <rogerdimi...@yahoo.com>
Subject Re: MapReduce usage with Lucene Indexing
Date Fri, 25 Jan 2008 18:10:39 GMT
Thanks for the idea. I will definitely try it out, but the requirement was more towards minimizing
File I/O as much as possible. RAM is not a problem so a RAM based index is preferred. So an
in-memory Index needs to be created as soon as the input is obtained. Or if you have any idea
on creating a FileSystem based index and then Sync it with RAM based one, please let me know.

Thanks a lot!!

----- Original Message ----
From: Rajagopal Natarajan <rajagopal.n@gmail.com>
To: core-user@hadoop.apache.org
Sent: Thursday, January 24, 2008 10:19:51 PM
Subject: Re: MapReduce usage with Lucene Indexing

On Jan 25, 2008 6:30 AM, roger dimitri <rogerdimitri@yahoo.com> wrote:

> Hi,
>   I am very new to Hadoop, and I have a project where I need to use
> to index some input given either as a a huge collection of Java
 objects or
> one huge java object.
>  I read about Hadoop's MapReduce utilities and I want to leverage
> feature in my case described above.
>  Can some one please tell me how I can approach the problem described
> above. Because all the Hadoop's MapReduce examples out there show
 only File
> based input and don't explicitly deal with data coming in as a huge
> object or so to speak.

Something that came just out of my head. When your input is a
 collection of
smaller objects, each independent of the other, you could serialize all
objects and write to a file, specify the RecordReader and the reducer
deserialize each object and perform indexing. I'll have to look into
details on java.io.Serializable and lucene API to be able to comment
 more on

N. Rajagopal,
Visit me at http://www.raja-gopal.com

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message