lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-6477) Add BKD tree for spatial shape query intersecting indexed points
Date Mon, 11 May 2015 15:50:00 GMT


Michael McCandless commented on LUCENE-6477:

I tested performance with the same "bounding boxes around London, UK"
on 60.8M points test from LUCENE-6450:

  Index size disk: 1.2 GB
  Index size heap: 0.75 MB
  Index time: 636.0 seconds (incl. forceMerge)
  Mean query time: .0068 sec
Indexing time is a bit slower (makes heavy use of OfflineSorter, but
should be O(N*log(N) overall).

It's very fast at search time: ~5.7X faster than GeoHashPrefixTree.

It also uses very little heap (0.75 MB) at search time for the inner
nodes of the tree.

It currently allocates FixedBitSet(maxDoc) per query & segment at
search time (like spatial does with RecursivePrefixTreeStrategy, I
think?).  Really it should use BitDocIdSet.Builder, to start sparse
and upgrade to FixedBitSet only if result set will be "big-ish", but
when I do that the query is 2.4X slower, which is frustrating.  I think
we could try using SentinelIntSet for the sparse case, like spatial
does in some cases.

> Add BKD tree for spatial shape query intersecting indexed points
> ----------------------------------------------------------------
>                 Key: LUCENE-6477
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: Trunk, 5.2
>         Attachments: LUCENE-6477.patch
> I'd like to explore using dedicated spatial trees for faster shape
> intersection filters than postings-based implementations.
> I implemented the tree data structure from
> The idea is simple: it builds a full binary tree, partitioning 2D
> space, alternately on lat and then lon, into smaller and smaller
> rectangles until a leaf has <= N (default 1024) points.
> It cannot index shapes (just points), and can then do fast shape
> intersection queries.  Multi-valued fields are supported.
> I only implemented the "point is contained in this bounding box" query
> for now, but I think polygon shape querying should be easy to
> implement using the same approach from LUCENE-6450.
> For indexing, you add BKDPointField (takes lat, lon) to your doc, and
> must set up your Codec use BKDTreeDocValuesFormat for that field.
> This DV format wraps Lucene50DVFormat, but then builds the disk-based
> BKD tree structure on the side.  BKDPointInBBoxQuery then requires this
> DVFormat, and casts it to gain access to the tree.
> I quantize each incoming double lat/lon to 32 bits precision (so 64
> bits per point) = ~9 milli-meter lon precision at the equator, I
> think.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message