lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8041) All Fields.terms(fld) impls should be O(N) not O(log(N))
Date Mon, 06 Nov 2017 22:00:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240977#comment-16240977
] 

Robert Muir commented on LUCENE-8041:
-------------------------------------

It doesn't need to be all *all* fields.terms impls. It is enough to optimize the default codec.


TreeMap is a good simple default, all the various alternative terms dicts can continue to
use it.
But the default codec should optimize for the access behavior that matters: accessing a field
randomly.

I don't think we should remove field iteration/Fields unless we remove the ability to change
term vectors "per-doc". It is currently needed (e.g. by CheckIndex) to know what fields were
truly indexed for a specific document with vectors, since that may disagree with FieldInfos.
If we fixed that, then it would truly be unnecessary and FieldInfos would be all we need.


> All Fields.terms(fld) impls should be O(N) not O(log(N))
> --------------------------------------------------------
>
>                 Key: LUCENE-8041
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8041
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: David Smiley
>
> I've seen apps that have a good number of fields -- hundreds.  The O(log(N)) of TreeMap
definitely shows up in a profiler; sometimes 20% of search time, if I recall.  There are many
Field implementations that are impacted... in part because Fields is the base class of FieldsProducer.
 
> As an aside, I hope Fields to go away some day; FieldsProducer should be TermsProducer
and not have an iterator of fields. If DocValuesProducer doesn't have this then why should
the terms index part of our API have it?  If we did this then the issue here would be a simple
transition to a HashMap.
> Or maybe we can switch to HashMap and relax the definition of Fields.iterator to not
necessarily be sorted?
> Perhaps the fix can be a relatively simple conversion over to LinkedHashMap in many cases
if we can assume when we initialize these internal maps that we consume them in sorted order
to begin with.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message