lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: New facet module
Date Mon, 11 Jul 2011 07:04:53 GMT
On Sat, 2011-07-09 at 05:44 +0200, Shai Erera wrote:
> The taxonomy is global to the index, but I think it will be
> interesting to explore per-segment taxonomy, and how it can be used to
> improve indexing or search perf (hopefully both).

I have struggled with this for some time and still haven't found a real
solution. Distributed faceting, with the special case segment based
faceting, is hard to do without a central taxonomy.

The new faceting module is explicit about the central taxonomy. My
experiments with
computes it at index open time. None of them work very well, if at all,
for a real distributed environment.

The problem is the same for flat faceting but is magnified with
hierarchical faceting: When the sorting order of facet elements is
popularity based, computing the correct counts for a top-X might
potentially involve comparison of the whole result from each part. 

A pathological case for flat faceting is
Part 1: A1(2), A2(2)... An(2)
Part 2: B1(3), B2(2), B3(2)... Bn(2), An(1)
where the correct top 3 answer is An(3), B1(3), A2(2), which requires
the full part results to get to the An(2) and An(1) as they are the last

For real world use, we can do clever counting so that we only return
what is necessary, but it does not change the worst case. To ensure that
we don't hit any million entries merge situations, we must cheat and
make a cutoff point.

With a multi-level faceting result (state/town/street expanded to top 5
elements on all levels) we must resolve quite a lot of elements to
ensure a high chance of getting the right elements with the right
counts. We can avoid this by drilling down one level at a time, but that
is just replacing bulk transfers with multiple requests: 1*5*5 is the
unrealistically low minimum for the address case.

- Toke

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message