lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Multiple Documents sharing a common boost
Date Tue, 21 Aug 2007 16:00:41 GMT
Ahhh, I was assuming you didn't need to look at all clusters.
Oops.

That said, the question is really whether this is "good enough"
compared to re-indexing, and only some tests will determine that.
I was surprised at how quickly a *large* number of ORs was
processed by Lucene.

You could also think about implementing a HitCollector that
boosted the raw score of each document based upon the
cluster ID, but be careful not to read the full document in
the HitCollector (you shouldn't have to though, either make
a map early or get creative with filters).

You might find useful information looking through the mail
archive for "faceting", as this seems like a similar
topic.

But I wouldn't go anywhere with anything custom until and
unless I'd satisfied myself that the simple approach of
letting Lucene handle a large set of OR clauses wasn't
performant. Several very bright people put significant
effort in to performance, I'd see if they've already done
the hard part <G>.....

Erick

On 8/21/07, Raghu Ram <raghuram.nadiminti@gmail.com> wrote:
>
> do you mean to say that we generate a compound query by AND ing the
> original
> query with a query like
>
> ( (cluster_id=0)^boost_cluster0 OR (cluster_id=1)^boost_cluster1...) )
>
> But is this not inefficient considering that the number of clusters is in
> hundreds ??????
>
>
>
>
>
> On 8/21/07, Erick Erickson <erickerickson@gmail.com> wrote:
> >
> > One solution is to keep meta-data in your index. Remember that
> > documents do not all have to have the same field. So you could
> > index a document with a single field
> > "metadatanotafieldinanyotherdoc" that contains, say, a list of
> > all of your clusters and their boosts. Read this document in at
> > startup time and cache it away in your server. Thereafter, you have
> > a set of boosts that can be applied at query time.
> >
> > Of course this useless if you wanted to boost at index time.
> > But I know of no way to change the boost of a document
> > without deleting and readding it with the new boost.
> >
> > Best
> > Erick
> >
> > On 8/21/07, Raghu Ram <raghuram.nadiminti@gmail.com> wrote:
> > >
> > > Is it possible to have multiple documents share a common boost?
> > >
> > > An example scenario is as follows. The set of documents are clustered
> > into
> > > some set of clusters. Each cluster has a unique clusterId. So each
> > > document
> > > has a cluster Id field that associates each document with its cluster.
> > > Each
> > > cluster has a property called cluster score. Each document has to be
> > > boosted
> > > by its cluster score. The number of clusters is very small in
> comparison
> > > to
> > > the number of documents (around 100 clusters).The cluster score is
> > updated
> > > on a continual basis. So the cluster score cant be stored as the
> > document
> > > boost for each individual document as we end up updating all the
> > documents
> > > boost daily which seems infeasible. We are trying to find out a
> solution
> > > that is more efficient.
> > >
> > > Thank you.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message