lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8557) LeafReader.getFieldInfos should always return the same instance
Date Tue, 06 Nov 2018 19:57:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677244#comment-16677244
] 

ASF subversion and git services commented on LUCENE-8557:
---------------------------------------------------------

Commit 12719d19609d87ab0e2a4132d4988dd4362b6575 in lucene-solr's branch refs/heads/branch_7x
from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12719d1 ]

LUCENE-8557: LeafReader.getFieldInfos should always return the same instance
MemoryIndex: compute/cache up-front
Solr Collapse/Expand with top_fc: compute/cache up-front
Json Facets numerics / hash DV: use the cached fieldInfos on SolrIndexSearcher
SolrIndexSearcher: move the cached FieldInfos to SlowCompositeReaderWrapper

Closes #487
(cherry picked from commit d0cd4245bdb8363e9adf3812817b9989ce4f506c)


> LeafReader.getFieldInfos should always return the same instance
> ---------------------------------------------------------------
>
>                 Key: LUCENE-8557
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8557
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 7.5
>            Reporter: Tim Underwood
>            Assignee: David Smiley
>            Priority: Major
>         Attachments: LUCENE-8557.patch
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Most implementations of the LeafReader cache an instance of FieldInfos which is returned
in the LeafReader.getFieldInfos() method.  There are a few places that currently do not and
this can cause performance problems.
> The most notable example is the lack of caching in Solr's SlowCompositeReaderWrapper
which caused unexpected performance slowdowns when trying to use Solr's JSON Facets compared
to the legacy facets.
> This proposed change is mostly relevant to Solr but touches a few Lucene classes.  Specifically:
> *1.* Adds a check to TestUtil.checkReader to verify that LeafReader.getFieldInfos() returns
the same instance:
>  
> {code:java}
> // FieldInfos should be cached at the reader and always return the same instance
>  if (reader.getFieldInfos() != reader.getFieldInfos()) {
>  throw new RuntimeException("getFieldInfos() returned different instances for class:
"+reader.getClass());
>  }
> {code}
> I'm not entirely sure this is wanted or needed but adding it uncovered most of the other
LeafReader implementations that were not caching FieldInfos.  I'm happy to remove this part
of the patch though.
>  
> *2.* Adds a FieldInfos.EMPTY that can be used in a handful of places
>  
> {code:java}
> public final static FieldInfos EMPTY = new FieldInfos(new FieldInfo[0]);
> {code}
> There are several places in the Lucene/Solr tests that were creating empty instances
of FieldInfos which were causing the check in #1 to fail.  This fixes those failures and
cleans up the code a bit.
> *3.* Fixes a few LeafReader implementations that were not caching FieldInfos
> Specifically:
>  * *MemoryIndex.MemoryIndexReader* - The constructor was already looping over the fields
so it seemed natural to just create the FieldInfos at that time
>  * *SlowCompositeReaderWrapper* - This was the one causing me trouble.  I've moved the
caching of FieldInfos from SolrIndexSearcher to SlowCompositeReaderWrapper.
>  * *CollapsingQParserPlugin.ReaderWrapper* - getFieldInfos() is immediately called twice
after this is constructed
>  * *ExpandComponent.ReaderWrapper* - getFieldInfos() is immediately called twice after
this is constructed
>  
> *4.* Minor Solr tweak to avoid calling SolrIndexSearcher.getSlowAtomicReader in FacetFieldProcessorByHashDV. 
This change is now optional since SlowCompositeReaderWrapper caches FieldInfos.
>  
> As suggested by [~dsmiley] this takes the place of SOLR-12878 since it touches some
Lucene code.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message