lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rafael Rossini" <rafael.ross...@gmail.com>
Subject Re: Indexing large sets of documents?
Date Thu, 27 Jul 2006 20:23:56 GMT
Oits,

   You mentioned the hadoop project. I check it out not a long time ago and
I read someting about it did not support the lucene index. Is it possible to
index and then search in a HDFS?

[]s
     Rossini


On 7/27/06, Otis Gospodnetic <otis_gospodnetic@yahoo.com> wrote:
>
> Michael,
>
> Certainly parallelizing on a set of servers would work (hmm... hadoop?),
> but if you want to do this on a single machine you should tune some of the
> IndexWriter params.  You didn't mention them, so I assume you didn't tune
> anything yet.  If you have Lucene in Action, check out
> 2.7.1  : Tuning indexing performance        starts on page 42  under
> section 2.7 (Controlling the indexing process) in chapter 2 (Indexing)
> (found via: http://lucenebook.com/search?query=index+tuning )
>
> If not, check maxBufferedDocs and mergeFactor in IndexWriter
> javadocs.  This is likely in the FAQ, too, but I didn't check.
>
> Otis
>
> ----- Original Message ----
> From: Michael J. Prichard
> To: java-user@lucene.apache.org
> Sent: Thursday, July 27, 2006 12:29:31 PM
> Subject: Indexing large sets of documents?
>
> I built an indexer that runs through email and its attachments, rips out
> content and what not and then creates a Document and adds it to an
> index.  It works w/ no problem.  The issue is that it takes around 3-5
> seconds per email and I have seen up to 10-15 seconds for email w/
> attachments.  I need to index 750k emails and at those times it will
> take FOREVER!  I am trying to find places to cut a second or two here or
> there but are there any suggestions as to what I can do?  Should I look
> into parallelizing indexing?  Help?!
>
> Thanks,
> Michael
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message