lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Solr faceting vs. Lucene faceting
Date Thu, 13 Dec 2012 12:03:55 GMT
Hi Shai,

On Thu, Dec 13, 2012 at 12:21 PM, Shai Erera <serera@gmail.com> wrote:
> As I said, if someone volunteers to do some work on the Solr side, I will
> gladly participate in that effort.
> I just don't even know where to start w/ Solr :).

The entry point for Solr facets is
org.apache.solr.request.SimpleFacets.getFacetCounts (called from
FacetComponent).

> One thing that would be really great is if we can build an adapter (I think
> someone mentioned that word here)
> which supports basic facets capabilities, so that we can at least benchmark
> Solr's current
> implementation vs the implementation w/ the module.

Comparing both impls would be great but an adapter might be hard to
write given how Lucene faceting differs from Solr faceting: the lucene
module requires users to decide at indexing time what and how to facet
whereas Solr does everything at searching time (there is even an issue
open in order to be able to compute facet counts based on arbitray
functions [1]) using FieldCache and UninvertedField (meaning that you
can compute facets on any field that is indexed). So Lucene faceting
would probably require an additional field property in the schema to
let Solr know that it should add category paths to documents? (Please
correct me if anything I wrote here is wrong).

I have a few questions regarding the faceting module:
 - do you have any rough idea of how speed and memory usage vary
depending on the number of docs to collect, distinct field values,
etc. ?
 - TaxonomyReader seems to use ints as ordinals for category paths,
does it mean that the faceting module can't handle paths that have
more than 2B distinct values? Is it fixable? (Or maybe it doesn't make
sense to handle such large numbers of distinct values?)

 [1] https://issues.apache.org/jira/browse/SOLR-1581

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message