lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivam Omar <shivam.o...@jeevansathi.com>
Subject Re: Solr soft commits
Date Fri, 11 May 2018 02:28:52 GMT


From: Shawn Heisey
Sent: Thursday, May 10, 9:43 PM
Subject: Re: Solr soft commits
To: solr-user@lucene.apache.org


On 5/10/2018 9:48 AM, Shivam Omar wrote: > I need some help in understanding solr soft
commits. As soft commits are about visibility and are fast in nature. They are advised for
nrt use cases. Soft commits *MIGHT* be faster than hard commits.  There are situations where
the performance of a soft commit and a hard commit with openSearcher=true will be about the
same, particularly if indexing is very heavy.

Thanks Shawn, So there are cases when soft commit will not be faster than the hard commit
with openSearcher=true. We have a case where we have to do bulk deletions in that case will
soft commit be faster than hard commits.

> I want to understand does soft commit also honor merge policies and do segment merging
for docs in memory. For example, in case, I keep hard commit interval very high and allow
few million documents to be in memory by using soft commit with no hard commit, can it affect
solr query time performance. Segments in memory are very likely not eligible for merging,
but I do not actually know whether that is the case. Using soft commits will NOT keep millions
of documents in memory.  Solr uses the NRTCachingDirectoryFactory from Lucene by default,
and uses it with default values, which are far too low to accommodate millions of documents.
 See the Javadoc for the directory to see what those defaults are: https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/store/NRTCachingDirectory.html
That page shows a directory creation with memory values of 5 and 60 MB, but the defaults in
the factory code (which is what Solr normally uses) are 4 and 48.  I'm pretty sure that you
can increase these values in solrconfig.xml, but really large values are not recommended.
 Large enough values to accommodate millions of documents would require the Java heap to also
be large, likely with no real performance advantage. If segment sizes exceed these values,
then they will not be cached in memory.  Older segments and segments that do not meet the
size requirements are flushed to disk.

Does it mean post crossing the memory threshold soft commits will lead lucene to flush data
to disk as in hard commit. Also does a soft commit has a query time performance cost than
doing a hard commit.

Thanks, Shawn

DISCLAIMER
This email and any files transmitted with it are intended solely for the person or the entity
to whom they are addressed and may contain information which is Confidential and Privileged.
Any misuse of the information contained in this email, including but not limited to retransmission
or dissemination of the said information by person or entities other than the intended recipient
is unauthorized and strictly prohibited. If you are not the intended recipient of this email,
please delete this email and contact the sender immediately.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message