lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: indexing slowdown with latest lucene udpate
Date Sun, 09 Aug 2009 21:11:32 GMT
I am concerned about this one as well. Especially since the majority
of the language analyzers in lucene-contrib do not implement
reusableTokenStream.

On Sun, Aug 9, 2009 at 5:06 PM, Michael Busch<buschmic@gmail.com> wrote:
> Are you sure that the initialization costs of the
> TokenStream/AttributeSource cause the slowdown? With the bw-comp. code now
> every call of a Token method goes through a delegation layer. I'm afraid
> that might cause a slowdown?
>
> The code that figures out what Attributes to put into the map uses
> reflection, but only if the impl wasn't seen before; otherwise the
> attributes are looked up in a cache.
>
> The culprit could also be the reflection code that checks which TokenStream
> methods are implemented.
>
> I can't look at the code right now (writing on my cell).
> Even if this is "fixable", I don't really like the fact that users who
> upgrade to 2.9 will potentially see such a performance hit unless they
> implement incrementToken() and reusableTokenStream.
>
>  Michael
>
> On Aug 9, 2009, at 11:13 AM, Yonik Seeley <yonik@lucidimagination.com>
> wrote:
>
>> FYI
>> https://issues.apache.org/jira/browse/SOLR-1353
>>
>> On Sun, Aug 9, 2009 at 2:02 PM, Yonik Seeley<yonik@lucidimagination.com>
>> wrote:
>>>
>>> It looks like implementing the new attribute stuff will not be enough
>>> - the token architecture has changed enough that it looks like we must
>>> cache tokenstreams to get back to good performance.
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>>
>>> On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<yonik@lucidimagination.com>
>>> wrote:
>>>>
>>>> OK, I've isolated (magnified) the effect with a test I just checked in.
>>>> Indexing documents directly at the UpdateHandler was 85% faster before
>>>> the latest lucene update.
>>>>
>>>> Run the test like this:
>>>>
>>>> ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
>>>> -Diter=100000"; grep throughput
>>>> build/test-results/*TestIndexingPerformance.xml
>>>>
>>>> To run on an older trunk version, just copy over
>>>> src/test/org/apache/solr/update/TestIndexingPerformance.java
>>>> src/test/test-files/solr/conf/solrconfig_perf.xml
>>>>
>>>> I had a throughput of 10946 docs/sec before the lucene update, and 5849
>>>> after.
>>>>
>>>> -Yonik
>>>> http://www.lucidimagination.com
>>>>
>>>>
>>>> On Sun, Aug 9, 2009 at 12:10 PM, Yonik
>>>> Seeley<yonik@lucidimagination.com> wrote:
>>>>>
>>>>> On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<gsingers@apache.org>
>>>>> wrote:
>>>>>>
>>>>>> Or bite the bullet and upgrade to the incrementToken() method.
>>>>>
>>>>> Right - I'm not sure if that would fix it or not - I haven't been
>>>>> involved in the new Token attribute stuff...
>>>>> I'm currently writing a basic indexing unit test that we can use to
>>>>> measure this (the standard solrconfig does stuff that slows down
>>>>> indexing a lot, but helps in catching bugs on edge cases by creating
>>>>> many segments).
>>>>>
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>>>
>>>>
>>>
>



-- 
Robert Muir
rcmuir@gmail.com

Mime
View raw message