lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Ochoa <marcelo.oc...@gmail.com>
Subject Re: Performance tips when creating a large index from database.
Date Thu, 22 Oct 2009 13:59:38 GMT
Hi Paul:
   Mostly of the time indexing big tables is spent on the table full
scan and network data transfer.
   Please take a quick look at my OOW08 presentation about Oracle
Lucene integration:
          http://docs.google.com/present/view?id=ddgw7sjp_156gf9hczxv
    specially slides 13 and 14 which shows time involved during a
WikiPedia dump indexing inside an Oracle database.
    Best regards, Marcelo.
On Thu, Oct 22, 2009 at 9:45 AM, Paul Taylor <paul_t100@fastmail.fm> wrote:
> I'm building a lucene index from a database, creating 1 about 1 million
> documents, unsuprisingly this takes quite a long time.
> I do this by sending a query  to the db over a range of ids , (10,000)
> records
> Add these results in Lucene
> Then get next 10,0000 and so on.
> When completed indexing I then call optimize()
> I also set  indexWriter.setMaxBufferedDocs(1000) and
>  indexWriter.setMergeFactor(3000) but don't fully understand these values.
> Each document contains about 10 small fields
>
> I'm looking for some ways to improve performance.
>
> This index writing is single threaded, is there a way I can multi-thread
> writing to the indexing ?
> I only call optimize() once at the end, is the best way to do it.
> I'm going to run a profiler over the code, but are there any rules of thumbs
> on the best values to set for MaxBufferedDocs and Mergefactor()
>
> thanks Paul
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 
Marcelo F. Ochoa
http://marceloochoa.blogspot.com/
http://marcelo.ochoa.googlepages.com/home
______________
Want to integrate Lucene and Oracle?
http://marceloochoa.blogspot.com/2007/09/running-lucene-inside-your-oracle-jvm.html
Is Oracle 11g REST ready?
http://marceloochoa.blogspot.com/2008/02/is-oracle-11g-rest-ready.html

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message