lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: any optimizations I can make on this code
Date Wed, 22 Jun 2011 08:58:31 GMT
So, you are reading 100 million records from somewhere and are writing
each record to one of 1 million indexes? Really 1 million, with an
average of 100 docs in each? 17 hours doesn't sound too bad to me.
Before worrying about lucene performance you should double check
everything else - in general lucene is not the bottleneck, but your
case may be different.

Caching the index writers is likely to help, at the cost of complexity
and memory.

How about reading all the records and storing a target index
identifier somewhere (memory? DB?) then either re-reading the 100
million in sequence of target index, or making 1 million passes
through - no, that doesn't sound too clever.

Sometime you just have to accept that doing complex operations on
large datasets can take a long time.


On Wed, Jun 22, 2011 at 2:06 AM, Hiller, Dean  x66079
<> wrote:
> I am running over a 100 million row nosql set and unfortunately building 1 million indexes.
 Each row I get may or may not be for the index I just wrote too so I can't keep IndexWriter
open very long.  I am currently simulating how long it would take me to build all the indexes
and it looks like it is somewhere around 17 hours :(
> Any other ways to optimize this code(and then I can maybe apply it to our index map/reduce
job), thanks, Dean  This is done in 20 different threads and again taking IndexWriter out
of the loop is probably not an option since as I go over the 100 million records each one
needs a different IndexWriter and I can't have too many IndexWriters open.
>            Directory dir = File(INDEX_DIR_PREFIX
>                  + this.account));
>            for (int i = 0; i < 125; i++) {
>               IndexWriterConfig conf = new IndexWriterConfig(
>                     Version.LUCENE_32, new KeywordAnalyzer());
>               IndexWriter writer = new IndexWriter(dir, conf);
>               LocalDate date = new LocalDate();
>               int random = this.r.nextInt(1000);
>               date = date.plusDays(random);
>               int next = this.r.nextInt(5000);
>               int name = this.r.nextInt(1000);
>               Document document = createDocument(("temp" + next),
>                     ("dean" + name),
>                                "some url", date);
>               writer.addDocument(document);
>               writer.close();
>            }
> Hmmmm, I maybe could use a IndexWriter cache of 2000 to leave them open until evicted?
 I can't think of anything else to help though.  Ideas?
> Thanks,
> Dean

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message