lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7110) Add Shape Support to BKD (extend to an R*/X-Tree data structure)
Date Tue, 23 Jan 2018 18:33:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336199#comment-16336199
] 

Robert Muir commented on LUCENE-7110:
-------------------------------------

Then shouldn't we just add Range? It'd like to see it generic like points, e.g. it would support
different dimensions (1D, 2D) ranges etc. I think it should be purely byte based and not bake
in stuff like coordinate systems or any of that: the Query side can handle that, just like
the Point queries handle it for their cases (box, radius, etc).

But I don't think we should conflate/mess with points. They already cover their use case well
and we shouldn't make them more complex.

> Add Shape Support to BKD (extend to an R*/X-Tree data structure)
> ----------------------------------------------------------------
>
>                 Key: LUCENE-7110
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7110
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Nicholas Knize
>            Priority: Major
>
> I've been tinkering with this off and on for a while and its showing some promise so
I'm going to open an issue to (eventually) add this feature to a future release.
> R*/X-Tree is a data structure designed to support Shapes (2D, 3D, nD) where, like the
internal node, the key for each leaf node is the Minimum Bounding Range (MBR - sometimes "incorrectly"
referred to as Minimum Bounding Rectangle) of the shape. Inserting a shape then boils down
to the best way of optimizing the tree structure. This optimization is driven by a set of
criteria for choosing the appropriate internal key (e.g., minimizing overlap between siblings,
maximizing "squareness", minimizing area, maximizing space usage). Query is then (a bit oversimplified)
a two-phase process:
>  * recurse each branch that overlaps with the MBR of the query shape
>  * compute the relation with the leaf node(s) - in higher dimensions (3+) this becomes
an increasingly difficult computational geometry problem
> The current BKD implementation is a special simplified case of an R*/X tree where, for
Point data, it is always guaranteed there will never be overlap between sibling nodes (because
you're using the point data as the keys). By exploiting this property the tree algorithms
(split, merge, etc) are relatively cheap (hence their performance boost over postings based
numerics). By modifying the key data, and extending the tree generation algorithms BKD logic
can be extended to support Shape data using the MBR as the Key and modifying split and merge
based on the criteria needed for optimizing a shape-based data structure.
> The initial implementation (based on limitations of the GeoAPI) will support 2D shapes
only. Once the GeoAPI can performantly handle 3D shapes the change is relatively trivial to
add the third dimension to the tree generation code.
> Like everything else, this feature will be created in sandbox and, once mature, will
graduate (likely to lucene-spatial or lucene-spatial-extras depending on the library needs).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message