lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: index size tripled during optimization
Date Wed, 28 Jan 2009 17:50:13 GMT
Does you index stay at triple size after optimization? It is normal for
Lucene to use 2x or upto 3x disk space during optimization but it should
fall back to the normal numbers once optimization completes and unused
segments are cleaned up due the index deletion policy.

If you search for threads in java-user (lucene) mailing list for disk space,
you'll find more information.

By the way, a merge factor of 1000 seems extremely large. Any special
reasons it was increased to this number?

On Wed, Jan 28, 2009 at 11:12 PM, Qingdi <qingdi@nextbio.com> wrote:

>
> Hi Ryuuichi,
>
> Thanks for your quick reply.
> I checked the setting of <useCompoundFile> in solrconfig.xml, and the value
> is 'false'. Here is what in our solrconfig.xml.
> =======================================================================
>  <indexDefaults>
>   <!-- Values here affect all index writers and act as a default unless
> overridden. -->
>    <useCompoundFile>false</useCompoundFile>
>    <mergeFactor>1000</mergeFactor> <!-- was 10 -->
>    <maxBufferedDocs>10000</maxBufferedDocs> <!-- was 1000 -->
>    <maxMergeDocs>2147483647</maxMergeDocs>
>    <maxFieldLength>100000</maxFieldLength>
>    <writeLockTimeout>1000</writeLockTimeout>
>    <commitLockTimeout>10000</commitLockTimeout>
>    <!--
>      As long as Solr is the only process modifying your index, it is
>      safe to use Lucene's in process locking mechanism.  But you may
>      specify one of the other Lucene LockFactory implementations in
>      the event that you have a custom situation.
>
>      none = NoLockFactory (typically only used with read only indexes)
>      single = SingleInstanceLockFactory (suggested)
>      native = NativeFSLockFactory
>      simple = SimpleFSLockFactory
>
>      ('simple' is the default for backwards compatibility with Solr 1.2)
>    -->
>    <lockType>single</lockType>
>  </indexDefaults>
>
>  <mainIndex>
>    <!-- options specific to the main on-disk lucene index -->
>    <useCompoundFile>false</useCompoundFile>
>    <mergeFactor>10</mergeFactor>
>    <maxBufferedDocs>1000</maxBufferedDocs>
>    <maxMergeDocs>2147483647</maxMergeDocs>
>    <maxFieldLength>100000</maxFieldLength>
>
>    <!-- If true, unlock any held write or commit locks on startup.
>         This defeats the locking mechanism that allows multiple
>         processes to safely access a lucene index, and should be
>         used with care.
>         This is not needed if lock type is 'none' or 'single'
>     -->
>    <unlockOnStartup>false</unlockOnStartup>
>
>    <useRAMDirectory>false</useRAMDirectory>
>  </mainIndex>
> =======================================================================
>
> Could there be any other reason causing the size tripled?
>
> Thanks.
>
> Qingdi
>
>
> Ryuuichi KUMAI wrote:
> >
> > Hello Qingdi,
> >
> > Have you changed the "<useCompoundFile>" setting in solrconfig.xml?
> > In my experience, when using compound-file index
> > ("<useCompoundFile>true</useCompoundFile>"),
> > the size of index grows up to triple during optimization.
> > My understanding is that when writing a new segment in compound format,
> > Lucene writes the multifile format first and then creates the compound
> > index.
> > So in the state immediately before optimization ends the size almost
> > triples.
> >
> > Regards,
> > Ryuuichi Kumai.
> >
> > 2009/1/28 Qingdi <qingdi@nextbio.com>:
> >>
> >>
> >> Hi,
> >>
> >> Starting about one week ago, our index size gets tripled during
> >> optimization.
> >>
> >> The current index statistics are:
> >> numDocs : 192702132
> >> size: 76G
> >> And we do optimization for every 6M docs update.
> >>
> >> Since we keep getting new data, the index size increases every day.
> >> Before,
> >> the index size was only doubled during optimization.
> >>
> >> Why the index size gets tripled instead of doubled during optimization?
> >> Is
> >> there anything we can do to keep the index only doubled during
> >> optimization?
> >>
> >> Thanks.
> >>
> >> Qingdi
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/index-size-tripled-during-optimization-tp21691596p21691596.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/index-size-tripled-during-optimization-tp21691596p21710810.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message