lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Preddy <scott.m.pre...@gmail.com>
Subject configuring solr3.6 for a large intensive index only run
Date Wed, 23 May 2012 18:00:33 GMT
I am trying to do a very large insertion (about 68million documents) into a
solr instance.

Our schema is pretty simple. About 40 fields using these types:

   <types>
      <fieldType name="string" class="solr.StrField" sortMissingLast="true"
omitNorms="true"/>
      <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
         <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
         </analyzer>
         <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
         </analyzer>
      </fieldType>
      <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
omitNorms="true" positionIncrementGap="0"/>
   </types>

We are running solrj clients from a hadoop cluster, and are struggling with
the merge process as time progresses.
As the number of documents grows, merging will eventually hog everything.

What we would really like to do is turn merging off and just do an index
run with a sparse solrconfig and then
start things back up with our runtime config which would kick off merging
when it starts.

Is there a way to do this?

I came close to finding an answer in this post, but did not find out how to
actually turn off merging.

Post by Mike McCandless:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message