lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4909) Solr and IndexReader Re-opening on Replication Slave
Date Tue, 10 Sep 2013 17:19:52 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13763257#comment-13763257
] 

ASF subversion and git services commented on SOLR-4909:
-------------------------------------------------------

Commit 1521556 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1521556 ]

SOLR-4909: Use DirectoryReader.openIfChanged in non-NRT mode
                
> Solr and IndexReader Re-opening on Replication Slave
> ----------------------------------------------------
>
>                 Key: SOLR-4909
>                 URL: https://issues.apache.org/jira/browse/SOLR-4909
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication (java), search
>    Affects Versions: 4.3
>            Reporter: Michael Garski
>             Fix For: 4.5, 5.0
>
>         Attachments: SOLR-4909_confirm_keys.patch, SOLR-4909-demo.patch, SOLR-4909_fix.patch,
SOLR-4909.patch, SOLR-4909.patch, SOLR-4909_v2.patch, SOLR-4909_v3.patch
>
>
> I've been experimenting with caching filter data per segment in Solr using a CachingWrapperFilter
& FilteredQuery within a custom query parser (as suggested by [~yonik@apache.org] in SOLR-3763)
and encountered situations where the value of getCoreCacheKey() on the AtomicReader for each
segment can change for a given segment on disk when the searcher is reopened. As CachingWrapperFilter
uses the value of the segment's getCoreCacheKey() as the key in the cache, there are situations
where the data cached on that segment is not reused when the segment on disk is still part
of the index. This affects the Lucene field cache and field value caches as well as they are
cached per segment.
> When Solr first starts it opens the searcher's underlying DirectoryReader in StandardIndexReaderFactory.newReader
by calling DirectoryReader.open(indexDir, termInfosIndexDivisor), and the reader is subsequently
reopened in SolrCore.openNewSearcher by calling DirectoryReader.openIfChanged(currentReader,
writer.get(), true). The act of reopening the reader with the writer when it was first opened
without a writer results in the value of getCoreCacheKey() changing on each of the segments
even though some of the segments have not changed. Depending on the role of the Solr server,
this has different effects:
> * On a SolrCloud node or free-standing index and search server the segment cache is invalidated
during the first DirectoryReader reopen - subsequent reopens use the same IndexWriter instance
and as such the value of getCoreCacheKey() on each segment does not change so the cache is
retained. 
> * For a master-slave replication set up the segment cache invalidation occurs on the
slave during every replication as the index is reopened using a new IndexWriter instance which
results in the value of getCoreCacheKey() changing on each segment when the DirectoryReader
is reopened using a different IndexWriter instance.
> I can think of a few approaches to alter the re-opening behavior to allow reuse of segment
level caches in both cases, and I'd like to get some input on other ideas before digging in:
> * To change the cloud node/standalone first commit issue it might be possible to create
the UpdateHandler and IndexWriter before the DirectoryReader, and use the writer to open the
reader. There is a comment in the SolrCore constructor by [~yonik@apache.org] that the searcher
should be opened before the update handler so that may not be an acceptable approach. 
> * To change the behavior of a slave in a replication set up, one solution would be to
not open a writer from the SnapPuller when the new index is retrieved if the core is enabled
as a slave only. The writer is needed on a server configured as a master & slave that
is functioning as a replication repeater so downstream slaves can see the changes in the index
and retrieve them.
> I'll attach a unit test that demonstrates the behavior of reopening the DirectoryReader
and it's effects on the value of getCoreCacheKey. My assumption is that the behavior of Lucene
during the various reader reopen operations is correct and that the changes are necessary
on the Solr side of things.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message