lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Getting a score of a specific document
Date Sun, 17 May 2009 21:00:40 GMT
I'm still unclear what you want the statistics *for*. "statistics"
are pretty meaningless as far as I understand. The whole point
of scoring is to use various "statistics" to *rank* documents *for
a specific query*. You cannot, for instance, compare scores
between different queries in any meaningful way.

If you're saying that you want the documents for your query to be
ranked *relative to each other*, but restricted to only the documents
you care about, then I think you need a HitCollector
because a Filter (last I knew) doesn't score documents therefore
won't order them.

But asking if the statistics reflect the whole index just isn't making any
sense to me. If you're asking that question I suspect that there's
something about your problem space I don't understand and
you're not explaining simply enough for me to grasp <G>.

So forget a Filter because you'll get the documents back in
(probably, but my memory is weak some days) document ID
order. Implement a HitCollector whose collect method only
sets bits for docs in your list. See:
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/HitCollector.html#collect(int,%20float)

Best
Erick


On Sun, May 17, 2009 at 3:57 AM, liat oren <oren.liat@gmail.com> wrote:

> Yes, this is what I need - I don't need to get the scores for the documents
> that were filtered.
> The statistics I ment are idf(t) for example.
> I want these to include the whole index of course.
> It will include this info of all the index, right?
>
> if I have a list of ids that the query should look at, which Filter should
> I
> use?
>
> Thanks a lot,
> Liat
>
> 2009/5/14 Erick Erickson <erickerickson@gmail.com>
>
> > Hmmm, come to think of it, if you pass the Filter to the search I*think*
> > you
> > don't get scores for that clause, but you may want to
> > check it out...
> >
> > So I think you should think about implementing a HitCollector
> > and collect only the documents you care about.
> >
> > This is really very little extra work since all the documents have
> > to be evaluated anyway.
> >
> > I'm not sure what you mean by statistics for the whole index. I suspect
> > you're wondering if the scores reflect all the documents. But you don't
> > care because scores are not relevant between different queries, and
> > if they are calculated only within the query you're running, all the
> > documents returned have scores that rank them relative to each other.
> >
> > Best
> > Erick
> >
> > On Thu, May 14, 2009 at 9:16 AM, liat oren <oren.liat@gmail.com> wrote:
> >
> > > Yes, I have a pre-defined list of documents that I care about.
> > > Then I can do the search on these, but it will take the statictics of
> the
> > > whole index, right?
> > >
> > >
> > >
> > >
> > > 2009/5/14 Erick Erickson <erickerickson@gmail.com>
> > >
> > > > I don't know if I'm understanding what you want, but if you havea
> > > > pre-defined list of documents, couldn't you form a Filter? Then
> > > > your results would only be the documents you care about.
> > > >
> > > > If this is irrelevant, perhaps you could explain a bit more about
> > > > the problem you're trying to solve.
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On Thu, May 14, 2009 at 5:03 AM, liat oren <oren.liat@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a big index and I want to get for a specific search only the
> > > > grades
> > > > > of a list of documents.
> > > > > Is there a better way to get this score than looping on all the
> > > reasults
> > > > > set?
> > > > >
> > > > > Thanks,
> > > > > Liat
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message