lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: indexing slowdown with latest lucene udpate
Date Mon, 10 Aug 2009 02:25:36 GMT
Michael Busch wrote:
> Are you sure that the initialization costs of the 
> TokenStream/AttributeSource cause the slowdown? With the bw-comp. code 
> now every call of a Token method goes through a delegation layer. I'm 
> afraid that might cause a slowdown?
Its isMethodOverriden and TokenStream<init>(AttributeSource).
>
> The code that figures out what Attributes to put into the map uses 
> reflection, but only if the impl wasn't seen before; otherwise the 
> attributes are looked up in a cache.
>
> The culprit could also be the reflection code that checks which 
> TokenStream methods are implemented.
>
> I can't look at the code right now (writing on my cell).
> Even if this is "fixable", I don't really like the fact that users who 
> upgrade to 2.9 will potentially see such a performance hit unless they 
> implement incrementToken() and reusableTokenStream.
Looks like you take a good hit, but keep in mind that test is almost 
worst case scenario as well - the Document text is extremely short.
>
>  Michael
>
> On Aug 9, 2009, at 11:13 AM, Yonik Seeley <yonik@lucidimagination.com> 
> wrote:
>
>> FYI
>> https://issues.apache.org/jira/browse/SOLR-1353
>>
>> On Sun, Aug 9, 2009 at 2:02 PM, Yonik 
>> Seeley<yonik@lucidimagination.com> wrote:
>>> It looks like implementing the new attribute stuff will not be enough
>>> - the token architecture has changed enough that it looks like we must
>>> cache tokenstreams to get back to good performance.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>> On Sun, Aug 9, 2009 at 12:57 PM, Yonik 
>>> Seeley<yonik@lucidimagination.com> wrote:
>>>> OK, I've isolated (magnified) the effect with a test I just checked 
>>>> in.
>>>> Indexing documents directly at the UpdateHandler was 85% faster before
>>>> the latest lucene update.
>>>>
>>>> Run the test like this:
>>>>
>>>> ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
>>>> -Diter=100000"; grep throughput
>>>> build/test-results/*TestIndexingPerformance.xml
>>>>
>>>> To run on an older trunk version, just copy over
>>>> src/test/org/apache/solr/update/TestIndexingPerformance.java
>>>> src/test/test-files/solr/conf/solrconfig_perf.xml
>>>>
>>>> I had a throughput of 10946 docs/sec before the lucene update, and 
>>>> 5849 after.
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>> On Sun, Aug 9, 2009 at 12:10 PM, Yonik 
>>>> Seeley<yonik@lucidimagination.com> wrote:
>>>>> On Sun, Aug 9, 2009 at 12:01 PM, Grant 
>>>>> Ingersoll<gsingers@apache.org> wrote:
>>>>>> Or bite the bullet and upgrade to the incrementToken() method.
>>>>>
>>>>> Right - I'm not sure if that would fix it or not - I haven't been
>>>>> involved in the new Token attribute stuff...
>>>>> I'm currently writing a basic indexing unit test that we can use to
>>>>> measure this (the standard solrconfig does stuff that slows down
>>>>> indexing a lot, but helps in catching bugs on edge cases by creating
>>>>> many segments).
>>>>>
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>>>
>>>>
>>>


-- 
- Mark

http://www.lucidimagination.com




Mime
View raw message