lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: Indexing slower in trunk
Date Mon, 13 Jun 2011 20:50:22 GMT
On Mon, Jun 13, 2011 at 8:13 PM, Erick Erickson <erickerickson@gmail.com> wrote:
> I half remember that this has come up before, but I couldn't find the
> thread. I was running some tests over the weekend that involved
> indexing 1.9M documents from the English Wiki dump.
>
> I'm consistently seeing that trunk takes about twice as long to index
> the docs as 1.4, 3.2 and 3x. Optimize is also taking quite a bit
> longer I admit that these aren't very sophisticated tests, and I only
> ran the trunk process twice (although both those were consistent).
>
> I'm pretty sure my rambuffersize and autocommit settings are
> identical. I remove the data/index directory before each run. These
> results are running the indexing program in IntelliJ, on my Mac, both
> the server and the indexing programs were running locally.
>
> No, trunk isn't compiling before running <G>.
>
> Here's the server definition:
> new StreamingUpdateSolrServer(url, 10, 4);
>
> and I'm batching up the documents and sending them to Solr in batches of 1,000.
>
> So, my question is whether this should be pursued. Note that I'm still
> getting around 3K docs/second, which I can't complain about. Not that
> that stops me, you understand. And in return for a memory footprint
> reduction from 389M to 90M after some off-the-wall sorting and
> faceting I'll take it!
>
> Hmmmm, speaking of which, the memory usage changes seem like a good
> candidate for a page on the Wiki, anyone want to suggest a home?
>
>
> Solr 1.4.1
> Total Time Taken-> 257 seconds
> Total documents added-> 1917728
> Docs/sec-> 7461
> starting optimize
> optimizing took 26 seconds
>
> Solr 3.2
> Total Time Taken-> 243 seconds
> Total documents added-> 1917728
> Docs/sec-> 7891
> starting optimize
> optimizing took 21 seconds
>
> Solr 3x
> Total Time Taken-> 269 seconds
> Total documents added-> 1917728
> Docs/sec-> 7129
> starting optimize
> optimizing took 21 seconds
>
> Solr trunk. 2011-6-11: 17:24 EST
> Total Time Taken-> 592 seconds
> Total documents added-> 1917728
> Docs/sec-> 3239
> starting optimize
> optimizing took 159 seconds
>
> What do folks think? Is there anything I can/should do to narrow this down?

Hi Eric,

this looks weird, I have some questions:

- you are indexing into the same disk as you read the data from?
- what are you rambuffer settings?
- how many threads are you using to send data to solr?
- what is your autocommit setting?

simon


>
> Erick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message