lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zheng Lin Edwin Yeo <edwinye...@gmail.com>
Subject Re: Merging of index in Solr
Date Thu, 23 Nov 2017 01:19:07 GMT
I'm doing the merging on the SSD drive, the speed should be ok?

We need to merge because the data are indexed in two different collections,
and we need them to be under the same collection, so that we can do things
like faceting more accurately.
Will sharding alone achieve this? Or do we have to merge first before we do
the sharding?

Regards,
Edwin

On 23 November 2017 at 01:32, Erick Erickson <erickerickson@gmail.com>
wrote:

> Really, let's back up here though. This sure seems like an XY problem.
> You're merging indexes that will eventually be something on the order
> of 3.5TB. I claim that an index of that size is very difficult to work
> with effectively. _Why_ do you want to do this? Do you have any
> evidence that you'll be able to effectively use it?
>
> And Shawn tells you that the result will be one large segment. If you
> replace documents in that index, it will consist of around 3.4975T
> wasted space before the segment is merged, see:
> https://lucidworks.com/2017/10/13/segment-merging-deleted-
> documents-optimize-may-bad/.
>
> You already know that merging is extremely painful. This sure seems
> like a case where the evidence is mounting that you would be far
> better off sharding and _not_ merging.
>
> FWIW,
> Erick
>
> On Wed, Nov 22, 2017 at 8:45 AM, Shawn Heisey <apache@elyograg.org> wrote:
> > On 11/21/2017 9:10 AM, Zheng Lin Edwin Yeo wrote:
> >> I am using the IndexMergeTool from Solr, from the command below:
> >>
> >> java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
> >> org.apache.lucene.misc.IndexMergeTool
> >>
> >> The heap size is 32GB. There are more than 20 million documents in the
> two
> >> cores.
> >
> > I have looked at IndexMergeTool, and confirmed that it does its job in
> > exactly the same way that Solr does an optimize, so I would still expect
> > a rate of 20 to 30 MB per second, unless it's running on REALLY old
> > hardware that can't transfer data that quickly.
> >
> > Thanks,
> > Shawn
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message