lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Varun Rajput <var...@gmail.com>
Subject Re: Solr Cloud Segments and Merging Issues
Date Thu, 13 Mar 2014 19:08:41 GMT
Hey Shawn,

> The config with the old policy used to be the literal name
> "mergeFactor".  With TieredMergePolicy, there are now three settings
> that must be changed in order to actually be the same as what
> mergeFactor used to do.The followingconfig snippet is the equivalent
> config to a mergeFactor of 10, so these are the default settings.  If
> you don't change all three (especially segmentsPerTier), then you are
> not actually changing the "mergeFactor".
> 
>    <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>      <int name="maxMergeAtOnce">10</int>
>      <int name="segmentsPerTier">10</int>
>      <int name="maxMergeAtOnceExplicit">30</int>
>    </mergePolicy>

I tried specifying all these configurations, but it still doesn't work as
expected. I even tried specifying a maxMergeSegmentMB to 20GB instead of the
default 5GB. This is the config I tried:

<mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
          <int name="maxMergeAtOnce">2</int>
          <int name="segmentsPerTier">2</int>
          <int name="maxMergeAtOnceExplicit">100</int>
          <long name="maxMergedSegmentMB">21990232555520</long>
        </mergePolicy>


> With newer Solr versions, there is not as much speedup to be gained from
> fewer segments as before.  There *is* a noticeable change, but it is no
> longer the night/day difference it used to be.

We did a performance test on a normal and optimized index and saw a
considerable improvement (almost double) in response time. That's the reason
why we want to reduce our number of segments as we have a large index with
very small amount of updates.

> Assuming that there are no system resource limitations(especially RAM),
> a distributed index is slower than a single index of the same total
> size.  Where distributed indexes have an edge is in very large indexes
> or indexes with a moderately high query rate -- by applying more total
> RAM and/or CPU resources to the problem.  If your index already fits
> entirely into the OS disk cache, or you are sending a a handful of test
> queries, you won't notice any performance benefit from going distributed.

We have a large index which won't fit in memory and need high query rates.

> For SUPER high query rates, you need more replicas.  More shards might
> actually make performance go down in this situation.

This is something we identified while testing. We had to optimize the number
of shards to be lesser but a reasonable number that will allow us grow the
size of data in future.

-Varun


> I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
> machine with a zookeeper quorum running on 3 other machines. The index
> size
> on each shard is about 15GB. I noticed that the number of segments in
> second shard was 42 and in the remaining shards was between 25-30.
>
> I am basically trying to get the number of segments down to a reasonable
> size like 4 or 5 in order to improve the search time. We do have some
> documents indexed everyday, so we don't want to do an optimize every day.
>
> The merge factor with the TierMergePolicy is only the number of segments
> per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
> shard, I tried clearing the index, reducing the mergeFactor and
> re-indexing
> the same data in the same manner, multiple times, but I don't see a
> pattern
> of reduction in number of segments.
>
> No mergeFactor set      =>     42 segments
> mergeFactor=5      =>       22 segments
> mergeFactor=2      =>       22 segments
>
> Below is the simple configuration, as specified in the documentation, I am
> using for merging:
>
> <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
>
>            <int name="maxMergeAtOnce">2</int>
>
>            <int name="segmentsPerTier">2</int>
>
> </mergePolicy>
>
> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"/>
>
> What is the best way in which I can use merging to restrict the number of
> segments being formed? 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-Segments-and-Merging-Issues-tp4123316p4123489.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message