lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Li, Ryan" <Ryan...@sensis.com.au>
Subject Solr add document over 20 times slower after upgrade from 4.0 to 4.9
Date Thu, 04 Sep 2014 02:14:25 GMT
I have a Solr server  indexes 2500 documents (up to 50MB each, ave 3MB) to Solr server. When
running on Solr 4.0 I managed to finish index in 3 hours.

However after we upgrade to Solr 4.9, the index need 3 days to finish.

I've done some profiling, numbers I get are:
size figure of document,    time for adding to Solr server (4.0), time for adding to Solr
server (4.9)
1.18,                                   6 sec,                                           
       123 sec
2.26                                   12sec                                             
     444 sec
3.35                                   18sec                                             
     over 600 sec
9.65                                    46sec                                            
     timeout.

>From what I can see index seems has an o(n) performance for Solr 4.0 and is almost o(log
n) for Solr 4.9. I also tried to comment out some copied fields to narrow down the problem,
seems size of the document after index(we copy fields and the more fields we copy, the bigger
the index size is)  is the dominating factor for index time.

Just wondering has any one experience similar problem? Does that sound like a bug of Solr
or just we have use Solr 4.9 wrong?

Here is one example of  field definition in my schema file.
        <fieldType name="text_stem" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="'+" replacement=""
/> <!-- strip off all apostrophe (') characters -->
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true"
synonyms="../../resources/type-index-synonyms.txt"/>
                <filter class="solr.SnowballPorterFilterFactory" language="English" />
                <!-- Used to have  language="English" - seems this param is gone in 4.9
-->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
            <analyzer type="query">
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="'+" replacement=""
/> <!-- strip off all apostrophe (') characters -->
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.ASCIIFoldingFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SynonymFilterFactory" expand="true" ignoreCase="true"
synonyms="../../resources/type-query-colloq-synonyms.txt"/>
                <filter class="solr.SnowballPorterFilterFactory" language="English" />
                <!-- Used to have  language="English" - seems this param is gone in 4.9
-->
                <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
            </analyzer>
        </fieldType>
Field:
<field name="majorTextSignalStem" type="text_stem" indexed="true" stored="false" multiValued="true"
omitNorms="false"/>
Copy:
 <copyField dest="majorTextSignalStem" source="majorTextSignalRaw" />

Thanks,
Ryan


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message