lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6191) Spatial 2D faceting (heatmaps)
Date Mon, 19 Jan 2015 20:41:35 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14282976#comment-14282976
] 

David Smiley commented on LUCENE-6191:
--------------------------------------

BTW I took a peek at ElasticSearch's geohash aggregations feature to see how that similar
feature worked.  It's quite different.  AFAICT, it's only for point data and works off of
DocValues, and at least presently it always exposes the counts as faceting on geohashes. (i.e.
frequency ordered geohash terms with counts).  The algorithmic complexity is based on the
number of documents matching your search O(docs), whereas this patch is O(log(terms)) with
a constant factor of how many grid cells you request.

> Spatial 2D faceting (heatmaps)
> ------------------------------
>
>                 Key: LUCENE-6191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6191
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.1
>
>         Attachments: LUCENE-6191__Spatial_heatmap.patch
>
>
> Lucene spatial's PrefixTree (grid) based strategies index data in a way highly amenable
to faceting on grids cells to compute a so-called _heatmap_. The underlying code in this patch
uses the PrefixTreeFacetCounter utility class which was recently refactored out of faceting
for NumberRangePrefixTree LUCENE-5735.  At a low level, the terms (== grid cells) are navigated
per-segment, forward only with TermsEnum.seek, so it's pretty quick and furthermore requires
no extra caches & no docvalues.  Ideally you should use QuadPrefixTree (or Flex once it
comes out) to maximize the number grid levels which in turn maximizes the fidelity of choices
when you ask for a grid covering a region.  Conveniently, the provided capability returns
the data in a 2-D grid of counts, so the caller needn't know a thing about how the data is
encoded in the prefix tree.  Well almost... at this point they need to provide a grid level,
but I'll soon provide a means of deriving the grid level based on a min/max cell count.
> I recommend QuadPrefixTree with geo=false so that you can provide a square world-bounds
(360x360 degrees), which means square grid cells which are more desirable to display than
rectangular cells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message