lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow
Date Wed, 16 May 2018 21:53:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478170#comment-16478170
] 

David Smiley commented on SOLR-12366:
-------------------------------------

* adds new {{SolrIndexSearcher.getLiveDocsBits()}} method that works like {{LeafReader.getLiveDocs}}
does.  I don't actually like the name of this method; IMO it ought to be simply {{getLiveDocs}}
but that conflicts with an existing one that I think ought to be named something like {{getLiveDocSet}}.
 Since these are internal methods I think just rename it but I'm okay with renaming in master.
 * affects SimpleFacets.getFacetTermEnumCounts (classic faceting), FacetFieldProcessorByEnumTermsStream
(JSON facets), UnInvertedField, GraphTermsQParser, JoinQParser, SolrIndexSearcher.getFirstMatch
 * In GraphTermsQParser I further noticed the non-SolrIndexSearcher fallback logic was broken
as it didn't check for a null liveDocs.  Will we ever even get to this code?  Any way, I
decided to replace these many lines with something simpler.

IMO some callers of {{SolrIndexSearcher.getSlowAtomicReader}} should change to use {{MultiFields}}
to avoid the temptation to have a LeafReader that has many slow methods.  I made this change
in SimpleFacets.getFacetTermEnumCounts.  This could be a follow-up issue.

IMO {{SolrIndexSearcher.getFirstMatch}} should be removed in lieu of \{{lookupId}} so there's
less code to maintain.  Admittedly the latter is more verbose but we could add a utility
method for callers who don't care about the segment ordinal and only want the global ID.

[~yseeley@gmail.com] could you please review?  This touches stuff you have been involved
with.

 

> Avoid SlowAtomicReader.getLiveDocs -- it's slow
> -----------------------------------------------
>
>                 Key: SOLR-12366
>                 URL: https://issues.apache.org/jira/browse/SOLR-12366
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: search
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>         Attachments: SOLR-12366.patch, SOLR-12366.patch
>
>
> SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) is slow
as it uses a binary search for each lookup.  There are various places in Solr that use SolrIndexSearcher.getSlowAtomicReader
and then get the liveDocs.  Most of these places ought to work with SolrIndexSearcher's getLiveDocs
method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message