lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4748) Add DrillSideways helper class to Lucene facets module
Date Thu, 07 Feb 2013 08:53:13 GMT


Shai Erera commented on LUCENE-4748:

Few comments:

* I think that DrillSideways can take a DrillDownQuery (once we finish with LUCENE-4750)?
** It will eliminate .addDrillDown (and it's ok I think that DDQ too will enforce all passed
CPs to belong to the same dimension)
** Though if we do that, how can we set minShouldMatch on sub-query?
** Maybe if DDQ itself won't wrap another Query, but just build a BQ over all CPs ... then
the user will need to wrap, but we can add a utility method.

* In .search(), just set minShouldMatch to 1 if (drillDownQueries.size() == 1)? It reads simpler...
** Also, why do you need to add a fake Query? I understand the rewrite will eliminate BQ and
return TQ, but what's the harm?
** Isn't minShouldMatch=1 in that case similar to TQ?

* In getDimIndex:
** Extract dims.size() to a variable so it's not executed in every loop?
** I think you can drop the if (cp.length > 0)? It doesn't make sense for someone to pass
an empty CP. Also, you can assert on that in .addDrillDown()
*** BTW, I noticed that you test that in DrillSidewaysCollector ctor too.
** I wonder if we made 'dims' LinkedHashSet it would perform better than these contains()
(in .addDrillDown), get(i). Then you could just do dims.get(fr.cp.components[0]). I didn't
try that in code, so not sure if you can get its index...

Also, I think we could simplify things if DrillSideways worked like this:

* Either exposed a .getQuery() method, or was itself a Query (like DDQ).
* Either exposed a .getCollector() method (returning DrillSidewaysCollector) or if it was
a Query, you'd just initialize a DrillSidewaysCollector (not a big deal, user-wise).
* The collector's getFacetResults() would do the "merging" work that I see in .search()

Then you:

* Won't need DrillSidewaysResult, which today wrap a List<FacetResult> and TopDocs.
Someone could MultiCollector.wrap(topDocsCollector, sidewaysCollector)? Just like w/ facets?
* Won't need the multitude of search() methods. Again, someone could wrap TopDocsCollector,
CachingCollector, TopFieldsCollector...

In DrillSidewaysCollector ctor:
* if (drillSidewaysRequest == null) -- that means the user asked to drill-down on some CPs
for dim X, but not requested to count it, right?
** Do we must throw an exception? Perhaps we can just drop the relevant Query clause? Although,
it's not very expected that a user would do that ... so perhaps keep the code for simplicity.
* Instead of doing Collections.singletonList you can just pass the single FacetRequest to
the vararg ctor. If you feel like it, we can optimize FacetSearchParams' vararg ctors to initialize
a singletonList if facetRequests.length == 1.
* exactCount = Math.max(2, dims.size()); -- maybe add a comment why '2'?

In DrillSidewaysCollector.setScorer:
* Why does Scorer.getChildren() return a Collection and not List? We used to have that in
IR.listCommits while in practice it was always a List. Can we fix Scorer?
** I looked at all Scorer.getChildren() impls and they either return a List (ArrayList in
most cases) or Collections.singleton (which is a Set). So it's indeed dangerous to assume
it's a List, but I think we should just fix Scorer?
* What do you mean by "// nocommit fragile: need tracker somehow..."? What's tracker?

In DrillSidewaysCollector.collect:
* Can you add some documentation to the 'if-else'?
> Add DrillSideways helper class to Lucene facets module
> ------------------------------------------------------
>                 Key: LUCENE-4748
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>         Attachments: LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch
> This came out of a discussion on the java-user list with subject
> "Faceted search in OR":
> The basic idea is to count "near misses" during collection, ie
> documents that matched the main query and also all except one of the
> drill down filters.
> Drill sideways makes for a very nice faceted search UI because you
> don't "lose" the facet counts after drilling in.  Eg maybe you do a
> search for "cameras", and you see facets for the manufacturer, so you
> drill into "Nikon".
> With drill sideways, even after drilling down, you'll still get the
> counts for all the other brands, where each count tells you how many
> hits you'd get if you changed to a different manufacturer.
> This becomes more fun if you add further drill-downs, eg maybe I next drill
> down into Resolution=10 megapixels", and then I can see how many 10
> megapixel cameras all other manufacturers, and what other resolutions
> Nikon cameras offer.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message