lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andres Taylor <andres.tay...@neotechnology.com>
Subject Re: Index statistics
Date Wed, 06 Jul 2011 11:42:03 GMT
Thanks. It was what I expected, but it's nice to have it confirmed.

On Tue, Jul 5, 2011 at 9:39 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> This API doesn't exist today.
>
> Lucene has long needed for queries impls to do this, so that we can
> properly plan/optimize how the query is run.  EG an AND query would
> use this to pick the more restrictive clause to drive the
> intersection.
>
> For TermQuery you could just call IR.docFreq?  (Doesn't take deletions
> into account so it'll always be an upper bound).
>
> For other queries... you could pull the scorer, iterate over some
> number of docs, and then "guestimate" based on what docID you got up
> to vs how many docs you asked for, how many matches there would be for
> the full index?  This would assume matches are uniformly distributed
> throughout the index (eg, that docs are indexed in random order) which
> is definitely not the case typically in practice.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Tue, Jul 5, 2011 at 2:19 PM, Andres Taylor
> <andres.taylor@neotechnology.com> wrote:
> > Hi there,
> >
> > A work with Neo4j <http://neo4j.org/>, a NoSQL graph
> > database tightly coupled with Lucene. I am now working on an optimizing
> > execution engine. To do this well, I would like to know more about the
> > existing Lucene indices. Ideally, I'd like to be able to ask a Lucene
> index
> > how many hits a query might give me, before I actually run the query. The
> > answer will probably just be an estimation, but that's fine.
> >
> > Is this possible today?
> >
> > Best regards,
> >
> > Andrés
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message