lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5339) Simplify the facet module APIs
Date Wed, 27 Nov 2013 11:55:40 GMT


Michael McCandless commented on LUCENE-5339:

bq. Been away from the issue for some time and it looks like a major progress, Chapeau à

Thanks Gilad, bit by bit...

bq. LabelAndValue & FacetResult use instanceof checks in their equals method - is that
a must?

Hmm, I'm don't know how to implement .equals without an instanceof check?

bq. FacetResult has a member called childCount - I think it's the number of categories/path/labels
that were encountered. The current jdocs "How many labels were populated under the requested
path" reveals implementation (population). Perhaps exchange populated with encountered?


bq. FloatRange and DoubleRange uses Math.nextUp/Down for infinity as the ranges are always
inclusive. Perhaps these constants for float and double could be static final.

Well, .nextUp and .nextAfter.  But, what constants?  The number is
computed differently for each range (from the provided min and mx)...

bq. TaxonomyFacetSumFloatAssociations and TaxonomyFacetSumValueSource reuse a LOT of code,
can they extend one another? perhaps extract a common super for both?

Well, they differ in the source of the value to aggregate (per doc vs
per ord), but then the other methods are nearly the same except for
the int/float difference... in fact, Fast/TaxoFacetCounts
getTopChildren is also the same.

I'll add a TODO...

bq. In TaxonomyFacets the parents array is saves, I could not see where it's being used (and
I think it's not used even in the older taxonomy-facet implementation).

Ooh, good catch: I removed it.

FacetConfig confuses me a bit, as it's very much aware of the Taxonomy, on another it handles
all the kinds of the facets.
Perhaps could be split up, allowing each FacetField.Type a build() method
of its own, rather than every types' building being done in the same method. It will also
bring a common parent class to all FacetField types, which I also like. As such, the taxonomy
part, with processFacetFields() could be moved to its respective Facet implementation.

I'm on the fence on whether FacetsConfig should hold the
taxonomyWriter, vs you must pass it to the build method ...

I like the idea of moving the build logic into each FacetField impl,
but I want to keep it simple for the app (i.e. the app should not have
to invoke N build methods).

> Simplify the facet module APIs
> ------------------------------
>                 Key: LUCENE-5339
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>         Attachments: LUCENE-5339.patch, LUCENE-5339.patch
> I'd like to explore simplifications to the facet module's APIs: I
> think the current APIs are complex, and the addition of a new feature
> (sparse faceting, LUCENE-5333) threatens to add even more classes
> (e.g., FacetRequestBuilder).  I think we can do better.
> So, I've been prototyping some drastic changes; this is very
> early/exploratory and I'm not sure where it'll wind up but I think the
> new approach shows promise.
> The big changes are:
>   * Instead of *FacetRequest/Params/Result, you directly instantiate
>     the classes that do facet counting (currently TaxonomyFacetCounts,
>     RangeFacetCounts or SortedSetDVFacetCounts), passing in the
>     SimpleFacetsCollector, and then you interact with those classes to
>     pull labels + values (topN under a path, sparse, specific labels).
>   * At index time, no more FacetIndexingParams/CategoryListParams;
>     instead, you make a new SimpleFacetFields and pass it the field it
>     should store facets + drill downs under.  If you want more than
>     one CLI you create more than one instance of SimpleFacetFields.
>   * I added a simple schema, where you state which dimensions are
>     hierarchical or multi-valued.  From this we decide how to index
>     the ordinals (no more OrdinalPolicy).
> Sparse faceting is just another method (getAllDims), on both taxonomy
> & ssdv facet classes.
> I haven't created a common base class / interface for all of the
> search-time facet classes, but I think this may be possible/clean, and
> perhaps useful for drill sideways.
> All the new classes are under oal.facet.simple.*.
> Lots of things that don't work yet: drill sideways, complements,
> associations, sampling, partitions, etc.  This is just a start ...

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message