lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kolodziej Christian <Kolodz...@huberverlag.de>
Subject Bad and continously degrading update performance
Date Tue, 09 Sep 2008 10:04:12 GMT
Hello everbody,

I've a question about the performance and the internal actions of the update process. We've
an index containing nearly 200.000 entries (one field contains much content), the schema.xml
is the following:

// ...
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true"
expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
</fieldType>
// ...
<fields>
   <field name="id" type="string" indexed="true" stored="true" required="true" />
   <field name="date" type="date" indexed="true" stored="false" required="true" />
   <field name="headline" type="text" indexed="true" stored="true" required="true" />
   <field name="companyid" type="integer" indexed="true" stored="false" required="true"
/>
   <field name="companyname" type="text" indexed="true" stored="true" required="true" />
   <field name="text" type="text" indexed="true" stored="true" required="true" />
   <field name="language" type="string" indexed="true" stored="false" required="true" />
</fields>
// ....

Every five minutes there is a cronjob, that updates a small number (between 1 and maybe 20)
of records that have been edited. But its speed is not satisfying, the needed time grows continuously
and was over 4 minutes before we restarted tomcat. That was very good for the first updates
(17 seconds), but soon the time raises again up to 170 and more seconds.

Does anyone have an idea were the problem is? Or is there no problem and the performance is
"normal" for our configuration? I hope there are some tricks out there to enhance the performance.

Best regards,
Christian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message