lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5735) Faceting for DateRangePrefixTree
Date Thu, 05 Feb 2015 05:33:34 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306666#comment-14306666
] 

David Smiley commented on LUCENE-5735:
--------------------------------------

The PrefixTreeFacetCounter utility is good; if it doesn't get committed to 5x as part of this
issue first, it will for the heatmap one.

There's a bug in NumberRangePrefixTreeStrategy.calcFacets in which all cells above the parent
are counted as topLeaves, when really that can only be done if the leaf cell _contains_ the
facet range.  I have a fix in-progress in which I detect this and if the cell doesn't contain
the facet range then I walk the sub-cells and increment the counters on the parent facet cells.
 _There's a rare-ish bug I need to debug still._  But thus far there are a few changes pending
in my local check-out:
* Make TreeCellIterator public (lucene.internal, still) and allow the 'cell' to be a cell
other than the top world cell.  Probably add a reset() constructor-like method to re-use an
instance.
* NRCell has an optimization when getting subCells that seems to work fine in the normal code-paths
thus far but the updated faceting code in-progress has shown the optimization to be faulty,
so I just removed it as I don't think it was worth trying to make it work.
* NRCell sometimes can't get subCells if it was initialized from a short length shape/bytes;
it should instead always initialize it's array to maxLevels.  Again; this apparently never
happen in normal code paths but in some toy test code I triggered it.
* Refactor the two main date range tests to share a random calendar utility (RandomCalHelper).

> Faceting for DateRangePrefixTree
> --------------------------------
>
>                 Key: LUCENE-5735
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5735
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.x
>
>         Attachments: LUCENE-5735.patch, LUCENE-5735.patch, LUCENE-5735__PrefixTreeFacetCounter.patch
>
>
> The newly added DateRangePrefixTree (DRPT) encodes terms in a fashion amenable to faceting
by meaningful time buckets. The motivation for this feature is to efficiently populate a calendar
bar chart or [heat-map|http://bl.ocks.org/mbostock/4063318]. It's not hard if you have date
instances like many do but it's challenging for date ranges.
> Internally this is going to iterate over the terms using seek/next with TermsEnum as
appropriate.  It should be quite efficient; it won't need any special caches. I should be
able to re-use SPT traversal code in AbstractVisitingPrefixTreeFilter.  If this goes especially
well; the underlying implementation will be re-usable for geospatial heat-map faceting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message