lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Indexing slower in trunk
Date Mon, 13 Jun 2011 18:13:06 GMT
I half remember that this has come up before, but I couldn't find the
thread. I was running some tests over the weekend that involved
indexing 1.9M documents from the English Wiki dump.

I'm consistently seeing that trunk takes about twice as long to index
the docs as 1.4, 3.2 and 3x. Optimize is also taking quite a bit
longer I admit that these aren't very sophisticated tests, and I only
ran the trunk process twice (although both those were consistent).

I'm pretty sure my rambuffersize and autocommit settings are
identical. I remove the data/index directory before each run. These
results are running the indexing program in IntelliJ, on my Mac, both
the server and the indexing programs were running locally.

No, trunk isn't compiling before running <G>.

Here's the server definition:
new StreamingUpdateSolrServer(url, 10, 4);

and I'm batching up the documents and sending them to Solr in batches of 1,000.

So, my question is whether this should be pursued. Note that I'm still
getting around 3K docs/second, which I can't complain about. Not that
that stops me, you understand. And in return for a memory footprint
reduction from 389M to 90M after some off-the-wall sorting and
faceting I'll take it!

Hmmmm, speaking of which, the memory usage changes seem like a good
candidate for a page on the Wiki, anyone want to suggest a home?


Solr 1.4.1
Total Time Taken-> 257 seconds
Total documents added-> 1917728
Docs/sec-> 7461
starting optimize
optimizing took 26 seconds

Solr 3.2
Total Time Taken-> 243 seconds
Total documents added-> 1917728
Docs/sec-> 7891
starting optimize
optimizing took 21 seconds

Solr 3x
Total Time Taken-> 269 seconds
Total documents added-> 1917728
Docs/sec-> 7129
starting optimize
optimizing took 21 seconds

Solr trunk. 2011-6-11: 17:24 EST
Total Time Taken-> 592 seconds
Total documents added-> 1917728
Docs/sec-> 3239
starting optimize
optimizing took 159 seconds

What do folks think? Is there anything I can/should do to narrow this down?

Erick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message