jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davide Giannella (JIRA)" <j...@apache.org>
Subject [jira] [Closed] (OAK-7379) Lucene Index: per-column selectivity, assume 5 unique entries
Date Wed, 16 Jan 2019 11:42:02 GMT

     [ https://issues.apache.org/jira/browse/OAK-7379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Davide Giannella closed OAK-7379.
---------------------------------

bulk close 1.10.0

> Lucene Index: per-column selectivity, assume 5 unique entries
> -------------------------------------------------------------
>
>                 Key: OAK-7379
>                 URL: https://issues.apache.org/jira/browse/OAK-7379
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene, query
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>              Labels: candidate_oak_1_8
>             Fix For: 1.9.0, 1.10.0
>
>
> Currently, if a query has a property restriction of the form "property = x", and the
property is indexed in a Lucene property index, the estimated cost is the index is the number
of documents indexed for that property. This is a very conservative estimate, it means all
documents have the same value. So the cost is relatively high for that index.
> In almost all cases, there are many distinct values for a property. Rarely there are
few values, or a skewed distribution where one value contains most documents. But in almost
all cases there are more than 5 distinct values.
> I think it makes sense to use 5 as the default value. It is still conservative (cost
of the index is high), but much better than now.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message