lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: indexing slowdown with latest lucene udpate
Date Mon, 10 Aug 2009 14:07:34 GMT
FWIW, seems like these issues should be brought up on java-dev.  Even  
if the changes in Lucene are back compatible, that's not much help if  
the large majority of users are going to take a similar hit to what  
Solr is taking.


On Aug 9, 2009, at 11:47 PM, Mark Miller wrote:

> isMethodOverriden is just nasty - copying Methods, security checks,  
> walking the type hierarchy, this, that, some more. I bet cglib has a  
> really fast version - too bad there is no built in equivalent.
>
> Its not nearly as clean, but what if a new TokenStream simply  
> identified itself as supporting increment, and the default impl  
> returns false? The developer knows at compile time right? Almost no  
> reason to keep asking the code over and over again, especially since  
> its so expensive. Then reusable doubles the cost.
>
> Mark Miller wrote:
>> Michael Busch wrote:
>>> Are you sure that the initialization costs of the TokenStream/ 
>>> AttributeSource cause the slowdown? With the bw-comp. code now  
>>> every call of a Token method goes through a delegation layer. I'm  
>>> afraid that might cause a slowdown?
>> Its isMethodOverriden and TokenStream<init>(AttributeSource).
>>>
>>> The code that figures out what Attributes to put into the map uses  
>>> reflection, but only if the impl wasn't seen before; otherwise the  
>>> attributes are looked up in a cache.
>>>
>>> The culprit could also be the reflection code that checks which  
>>> TokenStream methods are implemented.
>>>
>>> I can't look at the code right now (writing on my cell).
>>> Even if this is "fixable", I don't really like the fact that users  
>>> who upgrade to 2.9 will potentially see such a performance hit  
>>> unless they implement incrementToken() and reusableTokenStream.
>> Looks like you take a good hit, but keep in mind that test is  
>> almost worst case scenario as well - the Document text is extremely  
>> short.
>>>
>>> Michael
>>>
>>> On Aug 9, 2009, at 11:13 AM, Yonik Seeley <yonik@lucidimagination.com 
>>> > wrote:
>>>
>>>> FYI
>>>> https://issues.apache.org/jira/browse/SOLR-1353
>>>>
>>>> On Sun, Aug 9, 2009 at 2:02 PM, Yonik Seeley<yonik@lucidimagination.com

>>>> > wrote:
>>>>> It looks like implementing the new attribute stuff will not be  
>>>>> enough
>>>>> - the token architecture has changed enough that it looks like  
>>>>> we must
>>>>> cache tokenstreams to get back to good performance.
>>>>>
>>>>> -Yonik
>>>>> http://www.lucidimagination.com
>>>>>
>>>>>
>>>>> On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<yonik@lucidimagination.com

>>>>> > wrote:
>>>>>> OK, I've isolated (magnified) the effect with a test I just  
>>>>>> checked in.
>>>>>> Indexing documents directly at the UpdateHandler was 85% faster 

>>>>>> before
>>>>>> the latest lucene update.
>>>>>>
>>>>>> Run the test like this:
>>>>>>
>>>>>> ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
>>>>>> -Diter=100000"; grep throughput
>>>>>> build/test-results/*TestIndexingPerformance.xml
>>>>>>
>>>>>> To run on an older trunk version, just copy over
>>>>>> src/test/org/apache/solr/update/TestIndexingPerformance.java
>>>>>> src/test/test-files/solr/conf/solrconfig_perf.xml
>>>>>>
>>>>>> I had a throughput of 10946 docs/sec before the lucene update,  
>>>>>> and 5849 after.
>>>>>>
>>>>>> -Yonik
>>>>>> http://www.lucidimagination.com
>>>>>>
>>>>>>
>>>>>> On Sun, Aug 9, 2009 at 12:10 PM, Yonik Seeley<yonik@lucidimagination.com

>>>>>> > wrote:
>>>>>>> On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<gsingers@apache.org

>>>>>>> > wrote:
>>>>>>>> Or bite the bullet and upgrade to the incrementToken() method.
>>>>>>>
>>>>>>> Right - I'm not sure if that would fix it or not - I haven't
 
>>>>>>> been
>>>>>>> involved in the new Token attribute stuff...
>>>>>>> I'm currently writing a basic indexing unit test that we can
 
>>>>>>> use to
>>>>>>> measure this (the standard solrconfig does stuff that slows down
>>>>>>> indexing a lot, but helps in catching bugs on edge cases by 

>>>>>>> creating
>>>>>>> many segments).
>>>>>>>
>>>>>>> -Yonik
>>>>>>> http://www.lucidimagination.com
>>>>>>>
>>>>>>
>>>>>
>>
>>
>
>
> -- 
> - Mark
>
> http://www.lucidimagination.com
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message