lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Solr faceting vs. Lucene faceting
Date Thu, 13 Dec 2012 21:44:11 GMT
Hi Shai,

Thanks for your answers!

On Thu, Dec 13, 2012 at 5:05 PM, Shai Erera <serera@gmail.com> wrote:
>> the lucene module requires users to decide at indexing time what and how
>> to facet
>> whereas Solr does everything at searching time
>
> True, that's one difference between the two implementations today, even
> though I think that we can create a specialized path (under LUCENE-4619) for
> really simple, non-hierarchical cases.
> I don't know if and how Solr can handle a field value
> Sport/Basketball/NBA/... -- i.e., how is the hierarchy broken?

Solr doesn't break hierarchies. Its closest concept is pivot faceting
(https://issues.apache.org/jira/browse/SOLR-2894) available since 4.0
which allows you to compute hierarchical facets on the fly. For
example you can count brand counts per category (if both brand and
category are indexed).

> Making a decision at search time that you'd like to facet on a field ...
> well I think that not doing that is what allows us to do efficient faceted
> search, off-disk or in-memory, support really large indexes and taxonomies
> and be NRT.

Maybe it would be less efficient (or not?) butI think this kind of
flexibility can be great for some applications (I'm thinking to
analytics right now but there are probably many other use-cases). To
me the main issues with Solr faceting right now are that it consumes a
lot of memory and is not NRT-friendly because on uninversion time. But
I think this can be fixed by using doc values (because they can be
stored on dist and don't need to be uninverted) instead of the field
cache. I would really love that the faceting module became flexible
enough to be able to handle both index-time and search-time facets so
that Solr could become a consumer of this API instead of implementing
its own faceting logic.

> So I think that if anyone would want to really manage taxonomies of that
> size, we'd need to discuss and maybe get back to the drawing board :).

One use-case I'm thinking of is finding the top terms of documents
that match an arbitrary query. This can be very useful to help you
better understand your data, but in this case the number of distinct
values is the size of your term dictionary.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message