lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Indexstructure design
Date Sun, 08 Jun 2008 15:49:56 GMT
well, why not just include a field that identifies each document's class?
Then, to search over
all classes, just don't mention the class field.

When you *do* want to restrict by class, include "AND class:blah" in your
query.

This assumes you don't have a huge number of classes and it wouldn't be a
problem
to have a clause like AND (class:cl1 OR class:cl2 OR class:cl3......). That
said, a
clause containing 100s of class:blah isn't a problem.

And if you want to get really fancy, create a Filter for each class and you
can just
combine the filters appropriately when you want to query and restrict to
classes.

Limiting the results to the top 10/20 per class is tricker, but see
HitCollectors
for a way to intervene in the hit selection process. You could keep a simple
map
that counted by class and reject each doc that came through the hitcollector
class after it reached your limit. Beware of doing a Reader.get() on each
hit, you'd have to do some work with TermDocs/TermEnum.

All that said, I haven't really kept up on the faceting that's been talked
about
in the archive, so you may want to look at the searchable mail archive and
see what's up with that.

How big is your index? That is, how many documents and how many fields
(approx) are you talking about here? That'll influence whether you would
be better off keeping them all in one index or splitting them up. But if you
can keep them in a single index, your maintenance will be *much* easier.

Best
Erick

On Sat, Jun 7, 2008 at 10:34 AM, Sascha Fahl <sascha.fahl@googlemail.com>
wrote:

> Hi,
> I am quite new to the lucene scene and I need your help :-)
> There are several document classes. Lets say documents from class A, B, C,
> D and E. What I need is the following:
>
> 1) I want to search over all classes together. So the query should hit
> results from all different classes - ideally it is possible
> to limit the results from each class to lets say 10 or 20 results per
> class.
>
> 2) I want to search over all classes seperately.
>
> My first idea was to have one index per class and use a MultiReader for
> searching over all classes and use an IndexReader for
> searching over the classes seperately. Right now I have 3 questions:
>        1. Is that a good idea?
>        2. Is there a way to identify the index a result comes from?
>        3. Is it possible to limit the results to a number of hits from each
> index?
>
>
> Thank you,
> Sascha
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message