lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From RAUNAK AGRAWAL <agrawal.rau...@gmail.com>
Subject Re: Solr Streaming Queries Performance Issues [v7.2.1]
Date Fri, 28 Sep 2018 18:52:26 GMT
Thanks a lot Toki. I will get back to you soon regarding patch update after
having discussion with the team.

Thanks & Regards


On Fri, Sep 28, 2018 at 11:30 AM Toke Eskildsen <toes@kb.dk> wrote:

> RAUNAK AGRAWAL <agrawal.raunak@gmail.com> wrote:
>
> > curl http://localhost:8983/solr/collection_name/stream -d
> > 'expr=facet(collection_name,q="id:953",bucketSorts="week
> > desc",buckets="week",bucketSizeLimit=200,sum(sales),
> > sum(amount),sum(days))'
>
> Stats on numeric fields then.
>
> > Also in my collection, I have almost 10 Billion documents
> > with many deletions (close to 40%).
>
> Quite a lot of documents and in this case deletions counts, as the
> internal structures for the deleted documents still needs to be iterated.
> In scale this looks somewhat like our 18 billion document setup, with the
> addendum that we use quite large segments (900GB).
>
> The performance regressions we encountered with Solr 7 lead to
> https://issues.apache.org/jira/browse/LUCENE-8374 which helped a lot
> (performance testing has not finished). If you have or can easily create a
> test server where your shard(s) is the same size as your production shards,
> I'd be happy to port the patch to Solr 7.2.1 to see it it helps. I am
> looking for independent verification, so it is no bother.
>
> > I was planning to run optimise to merge the segments but
> > spoke to admin team and lucidworks guys and they were
> > against it saying that it will make very large segment file.
>
> If your bottleneck is the same as ours, the large segment would mean worse
> performance (with Solr 7).
>
> > Is it true that optimise in solr should not be used, as it comes with
> other issues?
>
> No simple answer there. If you have an index that you update very rarely,
> it can save memory and processing power. If you have a live index where you
> add and delete documents, it will probably be a bad idea. One strategy used
> with time series data is to have old and immutable data in dedicated
> collections, which can then be optimized.
>
> - Toke Eskildsen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message