lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: indexing slowdown with latest lucene udpate
Date Sun, 09 Aug 2009 21:06:46 GMT
Are you sure that the initialization costs of the TokenStream/ 
AttributeSource cause the slowdown? With the bw-comp. code now every  
call of a Token method goes through a delegation layer. I'm afraid  
that might cause a slowdown?

The code that figures out what Attributes to put into the map uses  
reflection, but only if the impl wasn't seen before; otherwise the  
attributes are looked up in a cache.

The culprit could also be the reflection code that checks which  
TokenStream methods are implemented.

I can't look at the code right now (writing on my cell).
Even if this is "fixable", I don't really like the fact that users who  
upgrade to 2.9 will potentially see such a performance hit unless they  
implement incrementToken() and reusableTokenStream.

  Michael

On Aug 9, 2009, at 11:13 AM, Yonik Seeley <yonik@lucidimagination.com>  
wrote:

> FYI
> https://issues.apache.org/jira/browse/SOLR-1353
>
> On Sun, Aug 9, 2009 at 2:02 PM, Yonik Seeley<yonik@lucidimagination.com 
> > wrote:
>> It looks like implementing the new attribute stuff will not be enough
>> - the token architecture has changed enough that it looks like we  
>> must
>> cache tokenstreams to get back to good performance.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>> On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<yonik@lucidimagination.com 
>> > wrote:
>>> OK, I've isolated (magnified) the effect with a test I just  
>>> checked in.
>>> Indexing documents directly at the UpdateHandler was 85% faster  
>>> before
>>> the latest lucene update.
>>>
>>> Run the test like this:
>>>
>>> ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
>>> -Diter=100000"; grep throughput
>>> build/test-results/*TestIndexingPerformance.xml
>>>
>>> To run on an older trunk version, just copy over
>>> src/test/org/apache/solr/update/TestIndexingPerformance.java
>>> src/test/test-files/solr/conf/solrconfig_perf.xml
>>>
>>> I had a throughput of 10946 docs/sec before the lucene update, and  
>>> 5849 after.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>> On Sun, Aug 9, 2009 at 12:10 PM, Yonik Seeley<yonik@lucidimagination.com 
>>> > wrote:
>>>> On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<gsingers@apache.org 
>>>> > wrote:
>>>>> Or bite the bullet and upgrade to the incrementToken() method.
>>>>
>>>> Right - I'm not sure if that would fix it or not - I haven't been
>>>> involved in the new Token attribute stuff...
>>>> I'm currently writing a basic indexing unit test that we can use to
>>>> measure this (the standard solrconfig does stuff that slows down
>>>> indexing a lot, but helps in catching bugs on edge cases by  
>>>> creating
>>>> many segments).
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>>
>>>
>>

Mime
View raw message