hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Lucene + Hadoop
Date Tue, 10 Nov 2009 23:55:22 GMT
I think that sounds right.
I believe that's what I did when I implemented this type of functionality for http://simpy.com/

I'm not sure why this is a Hadoop thing, though.

Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR

----- Original Message ----
> From: Hrishikesh Agashe <hrishikesh_agashe@persistent.co.in>
> To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
> Sent: Tue, November 10, 2009 4:56:33 AM
> Subject: Lucene + Hadoop
> Hi,
> I am trying to use Hadoop for Lucene index creation. I have to create multiple 
> indexes based on contents of the files (i.e. if author is "hrishikesh", it 
> should be added to a index for "hrishikesh". There has to be a separate index 
> for every author). For this, I am keeping multiple IndexWriter open for every 
> author and maintaining them in a hashmap in map() function. I parse incoming 
> file and if I see author is one for which I already have opened a IndexWriter, I 
> just add this file in that index, else I create a new IndesWriter for new 
> author. As authors might run into thousands, I am closing IndexWriter and 
> clearing hashmap once it reaches a certain threshold and starting all over 
> again. There is no reduced function.
> Does this logic sound correct? Is there any other way of implementing this 
> requirement?
> --Hrishi
> ==========
> This e-mail may contain privileged and confidential information which is the 
> property of Persistent Systems Ltd. It is intended only for the use of the 
> individual or entity to which it is addressed. If you are not the intended 
> recipient, you are not authorized to read, retain, copy, print, distribute or 
> use this message. If you have received this communication in error, please 
> notify the sender and delete all copies of this message. Persistent Systems Ltd. 
> does not accept any liability for virus infected mails.

View raw message