lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Making TopDocCollector a bit more consumable
Date Mon, 24 Aug 2009 15:31:32 GMT
I guess we can add something like this:

   *
   * @param docsScoredInOrder
   *          specifies if documents will be scored in doc ID order by the
   *          query. If you're not sure in advance, you can do the
following:
   *          <pre>
   *          boolean docsScoredInOrder =
!q.weight(searcher).scoresDocsOutOfOrder();
   *          TopScoreDocCollector tsdc =
TopScoreDocCollector.create(numHits, docsScoredInOrder);
   *          </pre>
   *
   * @see Weight#scoresDocsOutOfOrder()

I'm not even sure if the code example is needed ...

Do you want to add it to TSDC and TFC, or shall I open an issue for that?

Shai

On Mon, Aug 24, 2009 at 6:00 PM, Mark Miller <markrmiller@gmail.com> wrote:

> Thanks Shai. That all makes sense to me.
>
> bq. Perhaps we should add to the javadocs something like "you can call
> query.weight().scoresDocsOutOfOrder to instantiate the optimal TFC/TSDC"?
>
> I guess this is all I would argue for as well - basically a bit more
> informative javadoc for scoreOutOfOrder:
>
> TopScoreDocCollector:
>
>   * Creates a new {@link TopScoreDocCollector} given the number of hits to
>   * collect and whether documents are scored in order by the input
>   * {@link Scorer} to {@link #setScorer(Scorer)}.
>
> Shai Erera wrote:
> > I think we've had a similar discussion on this issue (as part of the
> > JIRA issue), and the reason for not defaulting to anything was
> > back-compat.
> >
> > For example, we know that not tracking doc scores is better when you
> > simply sort by a field. But we can't have a default that says "don't
> > track doc scores", since if people will use it - they might break. On
> > the other hand, defaulting in 2.9 to track doc scores is not good
> > either, because we want to stop tracking scores when you sort ...
> >
> > So the outcome was that the "easy" search methods on Searcher pick the
> > best defaults for you (and we've documented that in 3.0 those methods
> > will stop tracking scores etc.) and if you choose to instantiate your
> > own TopFieldCollector, then you probably know what you're doing, and
> > therefore defaults are not that important there.
> >
> > I guess back-compat wise we can say that in 2.9 there is a "create"
> > method which picks certain defaults and will change in 3.0. But I
> > think the bigger question is if someone instantiates TFC, does he do
> > it because he wants to override Lucene's Searcher defaults? I guess
> > the answer is not a definite YES (because I can think of cases where I
> > instantiate TFC for other purposes than overriding Lucene's defaults),
> > but is it perhaps MOST LIKELY?
> >
> > The one parameter which I think may confuse people is w/
> > docsScoredInOrder - that is only relevant if I use my own Scorer,
> > which I think is a very advanced thing. And if I need to instantiate
> > TFC or TSDC, I may not know what to pass there ... But here there is
> > no good default either, because it really depends on the query that is
> > run. Perhaps we should add to the javadocs something like "you can
> > call query.weight().scoresDocsOutOfOrder to instantiate the optimal
> > TFC/TSDC"?
> >
> > Shai
> >
> > On Mon, Aug 24, 2009 at 5:40 PM, Mark Miller <markrmiller@gmail.com
> > <mailto:markrmiller@gmail.com>> wrote:
> >
> >     I was just going to add actually:
> >
> >     Yes you can just use the other Searcher methods. Perhaps thats just
> >     fine. I don't think this a large issue.
> >
> >     But you could also use void search(Weight weight, Filter filter,
> >     Collector collector).
> >
> >     I've created my own TopDocs collectors for a handful of reasons in
> >     the past.
> >
> >     So I don't think this is a huge deal, but if you used the TopDoc
> >     collectors in the past,
> >     you just had to pass sort/numDocs - now that they are deprecated,
> >     if you
> >     happened to be
> >     using it - you go over to the new classes (after finding the new
> >     static
> >     factories) and are likely not sure what options to pick. Why not
> allow
> >     the same
> >     params and pick defaults that always work? People that want to eek
> out
> >     speed can tweak the
> >     longer param list.
> >
> >     I agree - its not a huge deal - I guess it is more advanced use -
> >     but it
> >     was much easier to follow
> >     and use with the deprecated versions. Its gotten quite a bit more
> >     confusing.
> >
> >     I'd still want to be able to play around with Collectors without
> being
> >     an expert.
> >
> >     Just an idea though - I don't think its 100% necessary. When I see
> >     advanced options that are more for optimization though,
> >     I like to have defaults so that I don't have to understand everything
> >     perfectly before I use it.
> >
> >     - Mark
> >
> >     Yonik Seeley wrote:
> >     > But creating the collector is expert use, right?
> >     > The normal use would be from Searcher:
> >     > TopDocs search(Query query, int n)
> >     > TopDocs search(Query query, Filter filter, int n)
> >     >
> >     >
> >     > -Yonik
> >     > http://www.lucidimagination.com
> >     >
> >     >
> >     >
> >     > On Mon, Aug 24, 2009 at 10:15 AM, Mark
> >     Miller<markrmiller@gmail.com <mailto:markrmiller@gmail.com>> wrote:
> >     >
> >     >> Hey all,
> >     >>
> >     >> Hits, which used to be the non expert search API has been
> >     deprecated -
> >     >> so TopDocs is now
> >     >> essentially the non expert search API. But when you go to use
> >     it you are
> >     >> greeted with:
> >     >>
> >     >>  public static TopFieldCollector create(Sort sort, int numHits,
> >     >>      boolean fillFields, boolean trackDocScores, boolean
> >     trackMaxScore,
> >     >>      boolean docsScoredInOrder)
> >     >>
> >     >> and
> >     >>
> >     >>  public static TopScoreDocCollector create(int numHits, boolean
> >     >> docsScoredInOrder) {
> >     >>
> >     >>    if (docsScoredInOrder) {
> >     >>      return new InOrderTopScoreDocCollector(numHits);
> >     >>    } else {
> >     >>      return new OutOfOrderTopScoreDocCollector(numHits);
> >     >>    }
> >     >>
> >     >>  }
> >     >>
> >     >> Woah ! Think of the poor noobies ;)
> >     >>
> >     >> I don't know if I want my docs scored in order. Seriously, I
> >     don't. Its
> >     >> sounds nice though. And fill fields? Please do I guess :)
> >     >>
> >     >> What do you think about having versions that default to something
> >     >> reasonable ? And you just have to give numhits and sort, numhits?
> >     >>
> >     >> This API now has a dual role IMO - expert and non expert.
> >     >>
> >     >> --
> >     >> - Mark
> >     >>
> >     >> http://www.lucidimagination.com
> >     >>
> >     >>
> >     >>
> >     >>
> >     >>
> >     ---------------------------------------------------------------------
> >     >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     >> For additional commands, e-mail:
> >     java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >     >>
> >     >>
> >     >>
> >     >
> >     >
> >     ---------------------------------------------------------------------
> >     > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     > For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >     >
> >     >
> >
> >
> >     --
> >     - Mark
> >
> >     http://www.lucidimagination.com
> >
> >
> >
> >
> >     ---------------------------------------------------------------------
> >     To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> >     <mailto:java-dev-unsubscribe@lucene.apache.org>
> >     For additional commands, e-mail: java-dev-help@lucene.apache.org
> >     <mailto:java-dev-help@lucene.apache.org>
> >
> >
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

Mime
View raw message