Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 70422 invoked from network); 24 Aug 2009 17:18:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Aug 2009 17:18:19 -0000 Received: (qmail 17176 invoked by uid 500); 24 Aug 2009 15:32:04 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 17112 invoked by uid 500); 24 Aug 2009 15:32:03 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 17084 invoked by uid 99); 24 Aug 2009 15:32:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Aug 2009 15:32:03 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of serera@gmail.com designates 209.85.219.222 as permitted sender) Received: from [209.85.219.222] (HELO mail-ew0-f222.google.com) (209.85.219.222) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Aug 2009 15:31:54 +0000 Received: by ewy22 with SMTP id 22so2748903ewy.28 for ; Mon, 24 Aug 2009 08:31:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=J0BULnKI4b89lxbT7I43HdJRlkKN/TBW6Z+3IWdcZoA=; b=YFdVEdEsL0PB6xX7jDavgVACCA8hbRGW78da3JmCHnsgFQsR5okDPrPevqSN9diOkt L6X25kErGvl8TcqZAyNZTvI2ryVc111A6K3eIdnzkPw5rs8OxPdbVsblHVVdVQ+EPwDK Zcr0p+bPAnx7p/3JRn1RFhSz/lAYa8Uezygpw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=BodXKyup7wSPbAAL3ZHAmhD9OYg3/3SeJZfaK3da6F09n4xx6GxxVeSUZ0rr9VFMZE zzOXutaW+rpNWOMgP+gsBcdYSxjudhsX0xCdV6PnqV85i6EXbAueD9OMdLLmqDBUfbdW CyOn/ZCcj+zQqiiRiaf2EUPjb4LG5rn6kn//0= MIME-Version: 1.0 Received: by 10.216.17.213 with SMTP id j63mr969633wej.140.1251127892731; Mon, 24 Aug 2009 08:31:32 -0700 (PDT) In-Reply-To: <4A92AB14.5010903@gmail.com> References: <4A92A09C.5030308@gmail.com> <4A92A64C.7030508@gmail.com> <786fde50908240752p198a98e7y4c2bdbf874c9adef@mail.gmail.com> <4A92AB14.5010903@gmail.com> Date: Mon, 24 Aug 2009 18:31:32 +0300 Message-ID: <786fde50908240831s6b023d1ag65088602b10cf686@mail.gmail.com> Subject: Re: Making TopDocCollector a bit more consumable From: Shai Erera To: java-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016364d1ccf9d68270471e4e903 X-Virus-Checked: Checked by ClamAV on apache.org --0016364d1ccf9d68270471e4e903 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit I guess we can add something like this: * * @param docsScoredInOrder * specifies if documents will be scored in doc ID order by the * query. If you're not sure in advance, you can do the following: *
   *          boolean docsScoredInOrder =
!q.weight(searcher).scoresDocsOutOfOrder();
   *          TopScoreDocCollector tsdc =
TopScoreDocCollector.create(numHits, docsScoredInOrder);
   *          
* * @see Weight#scoresDocsOutOfOrder() I'm not even sure if the code example is needed ... Do you want to add it to TSDC and TFC, or shall I open an issue for that? Shai On Mon, Aug 24, 2009 at 6:00 PM, Mark Miller wrote: > Thanks Shai. That all makes sense to me. > > bq. Perhaps we should add to the javadocs something like "you can call > query.weight().scoresDocsOutOfOrder to instantiate the optimal TFC/TSDC"? > > I guess this is all I would argue for as well - basically a bit more > informative javadoc for scoreOutOfOrder: > > TopScoreDocCollector: > > * Creates a new {@link TopScoreDocCollector} given the number of hits to > * collect and whether documents are scored in order by the input > * {@link Scorer} to {@link #setScorer(Scorer)}. > > Shai Erera wrote: > > I think we've had a similar discussion on this issue (as part of the > > JIRA issue), and the reason for not defaulting to anything was > > back-compat. > > > > For example, we know that not tracking doc scores is better when you > > simply sort by a field. But we can't have a default that says "don't > > track doc scores", since if people will use it - they might break. On > > the other hand, defaulting in 2.9 to track doc scores is not good > > either, because we want to stop tracking scores when you sort ... > > > > So the outcome was that the "easy" search methods on Searcher pick the > > best defaults for you (and we've documented that in 3.0 those methods > > will stop tracking scores etc.) and if you choose to instantiate your > > own TopFieldCollector, then you probably know what you're doing, and > > therefore defaults are not that important there. > > > > I guess back-compat wise we can say that in 2.9 there is a "create" > > method which picks certain defaults and will change in 3.0. But I > > think the bigger question is if someone instantiates TFC, does he do > > it because he wants to override Lucene's Searcher defaults? I guess > > the answer is not a definite YES (because I can think of cases where I > > instantiate TFC for other purposes than overriding Lucene's defaults), > > but is it perhaps MOST LIKELY? > > > > The one parameter which I think may confuse people is w/ > > docsScoredInOrder - that is only relevant if I use my own Scorer, > > which I think is a very advanced thing. And if I need to instantiate > > TFC or TSDC, I may not know what to pass there ... But here there is > > no good default either, because it really depends on the query that is > > run. Perhaps we should add to the javadocs something like "you can > > call query.weight().scoresDocsOutOfOrder to instantiate the optimal > > TFC/TSDC"? > > > > Shai > > > > On Mon, Aug 24, 2009 at 5:40 PM, Mark Miller > > wrote: > > > > I was just going to add actually: > > > > Yes you can just use the other Searcher methods. Perhaps thats just > > fine. I don't think this a large issue. > > > > But you could also use void search(Weight weight, Filter filter, > > Collector collector). > > > > I've created my own TopDocs collectors for a handful of reasons in > > the past. > > > > So I don't think this is a huge deal, but if you used the TopDoc > > collectors in the past, > > you just had to pass sort/numDocs - now that they are deprecated, > > if you > > happened to be > > using it - you go over to the new classes (after finding the new > > static > > factories) and are likely not sure what options to pick. Why not > allow > > the same > > params and pick defaults that always work? People that want to eek > out > > speed can tweak the > > longer param list. > > > > I agree - its not a huge deal - I guess it is more advanced use - > > but it > > was much easier to follow > > and use with the deprecated versions. Its gotten quite a bit more > > confusing. > > > > I'd still want to be able to play around with Collectors without > being > > an expert. > > > > Just an idea though - I don't think its 100% necessary. When I see > > advanced options that are more for optimization though, > > I like to have defaults so that I don't have to understand everything > > perfectly before I use it. > > > > - Mark > > > > Yonik Seeley wrote: > > > But creating the collector is expert use, right? > > > The normal use would be from Searcher: > > > TopDocs search(Query query, int n) > > > TopDocs search(Query query, Filter filter, int n) > > > > > > > > > -Yonik > > > http://www.lucidimagination.com > > > > > > > > > > > > On Mon, Aug 24, 2009 at 10:15 AM, Mark > > Miller> wrote: > > > > > >> Hey all, > > >> > > >> Hits, which used to be the non expert search API has been > > deprecated - > > >> so TopDocs is now > > >> essentially the non expert search API. But when you go to use > > it you are > > >> greeted with: > > >> > > >> public static TopFieldCollector create(Sort sort, int numHits, > > >> boolean fillFields, boolean trackDocScores, boolean > > trackMaxScore, > > >> boolean docsScoredInOrder) > > >> > > >> and > > >> > > >> public static TopScoreDocCollector create(int numHits, boolean > > >> docsScoredInOrder) { > > >> > > >> if (docsScoredInOrder) { > > >> return new InOrderTopScoreDocCollector(numHits); > > >> } else { > > >> return new OutOfOrderTopScoreDocCollector(numHits); > > >> } > > >> > > >> } > > >> > > >> Woah ! Think of the poor noobies ;) > > >> > > >> I don't know if I want my docs scored in order. Seriously, I > > don't. Its > > >> sounds nice though. And fill fields? Please do I guess :) > > >> > > >> What do you think about having versions that default to something > > >> reasonable ? And you just have to give numhits and sort, numhits? > > >> > > >> This API now has a dual role IMO - expert and non expert. > > >> > > >> -- > > >> - Mark > > >> > > >> http://www.lucidimagination.com > > >> > > >> > > >> > > >> > > >> > > --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > > > > >> For additional commands, e-mail: > > java-dev-help@lucene.apache.org > > > > >> > > >> > > >> > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > > > > > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > > > > > > > > > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > > > > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > > > > > > > -- > - Mark > > http://www.lucidimagination.com > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > --0016364d1ccf9d68270471e4e903 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I guess we can add something like this:

=A0=A0 * =A0=A0 * @param docsScoredInOrder
=A0=A0 *=A0=A0=A0=A0=A0=A0=A0=A0=A0 = specifies if documents will be scored in doc ID order by the
=A0=A0 *=A0= =A0=A0=A0=A0=A0=A0=A0=A0 query. If you're not sure in advance, you can = do the following:
=A0=A0 *=A0=A0=A0=A0=A0=A0=A0=A0=A0 <pre>
=A0=A0 *=A0=A0=A0=A0=A0= =A0=A0=A0=A0 boolean docsScoredInOrder =3D !q.weight(searcher).scoresDocsOu= tOfOrder();
=A0=A0 *=A0=A0=A0=A0=A0=A0=A0=A0=A0 TopScoreDocCollector tsd= c =3D TopScoreDocCollector.create(numHits, docsScoredInOrder);
=A0=A0 *= =A0=A0=A0=A0=A0=A0=A0=A0=A0 </pre>
=A0=A0 *=A0=A0=A0=A0=A0=A0=A0=A0=A0
=A0=A0 * @see Weight#scoresDocsOutO= fOrder()

I'm not even sure if the code example is needed ...
=
Do you want to add it to TSDC and TFC, or shall I open an issue for tha= t?

Shai

On Mon, Aug 24, 2009 at 6:00 PM, Mark Miller <markrmiller@gma= il.com> wrote:
Thanks Shai. That all makes sense to me.

bq. Perhaps we should add to the javadocs something like "you can call=
query.weight().scoresDocsOutOfOrder to instantiate the op= timal TFC/TSDC"?

I guess this is all I would argue for as well - basically a bit more<= br> informative javadoc for scoreOutOfOrder:

TopScoreDocCollector:

=A0 * Creates a new {@link TopScoreDocCollector} given the number of hits = to
=A0 * collect and whether documents are scored in order by the input
=A0 * {@link Scorer} to {@link #setScorer(Scorer)}.

Shai Erera wrote:
> I think we've had a similar discussion on this issue (as part of t= he
> JIRA issue), and the reason for not defaulting to anything was
> back-compat.
>
> For example, we know that not tracking doc scores is better when you > simply sort by a field. But we can't have a default that says &quo= t;don't
> track doc scores", since if people will use it - they might break= . On
> the other hand, defaulting in 2.9 to track doc scores is not good
> either, because we want to stop tracking scores when you sort ...
>
> So the outcome was that the "easy" search methods on Searche= r pick the
> best defaults for you (and we've documented that in 3.0 those meth= ods
> will stop tracking scores etc.) and if you choose to instantiate your<= br> > own TopFieldCollector, then you probably know what you're doing, a= nd
> therefore defaults are not that important there.
>
> I guess back-compat wise we can say that in 2.9 there is a "creat= e"
> method which picks certain defaults and will change in 3.0. But I
> think the bigger question is if someone instantiates TFC, does he do > it because he wants to override Lucene's Searcher defaults? I gues= s
> the answer is not a definite YES (because I can think of cases where I=
> instantiate TFC for other purposes than overriding Lucene's defaul= ts),
> but is it perhaps MOST LIKELY?
>
> The one parameter which I think may confuse people is w/
> docsScoredInOrder - that is only relevant if I use my own Scorer,
> which I think is a very advanced thing. And if I need to instantiate > TFC or TSDC, I may not know what to pass there ... But here there is > no good default either, because it really depends on the query that is=
> run. Perhaps we should add to the javadocs something like "you ca= n
> call query.weight().scoresDocsOutOfOrder to instantiate the optimal > TFC/TSDC"?
>
> Shai
>
> On Mon, Aug 24, 2009 at 5:40 PM, Mark Miller <markrmiller@gmail.com
> <mailto:markrmiller@gmail.com>> wrote:
>
> =A0 =A0 I was just going to add actually:
>
> =A0 =A0 Yes you can just use the other Searcher methods. Perhaps thats= just
> =A0 =A0 fine. I don't think this a large issue.
>
> =A0 =A0 But you could also use void search(Weight weight, Filter filte= r,
> =A0 =A0 Collector collector).
>
> =A0 =A0 I've created my own TopDocs collectors for a handful of re= asons in
> =A0 =A0 the past.
>
> =A0 =A0 So I don't think this is a huge deal, but if you used the = TopDoc
> =A0 =A0 collectors in the past,
> =A0 =A0 you just had to pass sort/numDocs - now that they are deprecat= ed,
> =A0 =A0 if you
> =A0 =A0 happened to be
> =A0 =A0 using it - you go over to the new classes (after finding the n= ew
> =A0 =A0 static
> =A0 =A0 factories) and are likely not sure what options to pick. Why n= ot allow
> =A0 =A0 the same
> =A0 =A0 params and pick defaults that always work? People that want to= eek out
> =A0 =A0 speed can tweak the
> =A0 =A0 longer param list.
>
> =A0 =A0 I agree - its not a huge deal - I guess it is more advanced us= e -
> =A0 =A0 but it
> =A0 =A0 was much easier to follow
> =A0 =A0 and use with the deprecated versions. Its gotten quite a bit m= ore
> =A0 =A0 confusing.
>
> =A0 =A0 I'd still want to be able to play around with Collectors w= ithout being
> =A0 =A0 an expert.
>
> =A0 =A0 Just an idea though - I don't think its 100% necessary. Wh= en I see
> =A0 =A0 advanced options that are more for optimization though,
> =A0 =A0 I like to have defaults so that I don't have to understand= everything
> =A0 =A0 perfectly before I use it.
>
> =A0 =A0 - Mark
>
> =A0 =A0 Yonik Seeley wrote:
> =A0 =A0 > But creating the collector is expert use, right?
> =A0 =A0 > The normal use would be from Searcher:
> =A0 =A0 > TopDocs search(Query query, int n)
> =A0 =A0 > TopDocs search(Query query, Filter filter, int n)
> =A0 =A0 >
> =A0 =A0 >
> =A0 =A0 > -Yonik
> =A0 =A0 > http://www.lucidimagination.com
> =A0 =A0 >
> =A0 =A0 >
> =A0 =A0 >
> =A0 =A0 > On Mon, Aug 24, 2009 at 10:15 AM, Mark
> =A0 =A0 Miller<markrmiller@gmail.com <mailto:markrmiller@gmail.com>> wro= te:
> =A0 =A0 >
> =A0 =A0 >> Hey all,
> =A0 =A0 >>
> =A0 =A0 >> Hits, which used to be the non expert search API has = been
> =A0 =A0 deprecated -
> =A0 =A0 >> so TopDocs is now
> =A0 =A0 >> essentially the non expert search API. But when you g= o to use
> =A0 =A0 it you are
> =A0 =A0 >> greeted with:
> =A0 =A0 >>
> =A0 =A0 >> =A0public static TopFieldCollector create(Sort sort, = int numHits,
> =A0 =A0 >> =A0 =A0 =A0boolean fillFields, boolean trackDocScores= , boolean
> =A0 =A0 trackMaxScore,
> =A0 =A0 >> =A0 =A0 =A0boolean docsScoredInOrder)
> =A0 =A0 >>
> =A0 =A0 >> and
> =A0 =A0 >>
> =A0 =A0 >> =A0public static TopScoreDocCollector create(int numH= its, boolean
> =A0 =A0 >> docsScoredInOrder) {
> =A0 =A0 >>
> =A0 =A0 >> =A0 =A0if (docsScoredInOrder) {
> =A0 =A0 >> =A0 =A0 =A0return new InOrderTopScoreDocCollector(num= Hits);
> =A0 =A0 >> =A0 =A0} else {
> =A0 =A0 >> =A0 =A0 =A0return new OutOfOrderTopScoreDocCollector(= numHits);
> =A0 =A0 >> =A0 =A0}
> =A0 =A0 >>
> =A0 =A0 >> =A0}
> =A0 =A0 >>
> =A0 =A0 >> Woah ! Think of the poor noobies ;)
> =A0 =A0 >>
> =A0 =A0 >> I don't know if I want my docs scored in order. S= eriously, I
> =A0 =A0 don't. Its
> =A0 =A0 >> sounds nice though. And fill fields? Please do I gues= s :)
> =A0 =A0 >>
> =A0 =A0 >> What do you think about having versions that default = to something
> =A0 =A0 >> reasonable ? And you just have to give numhits and so= rt, numhits?
> =A0 =A0 >>
> =A0 =A0 >> This API now has a dual role IMO - expert and non exp= ert.
> =A0 =A0 >>
> =A0 =A0 >> --
> =A0 =A0 >> - Mark
> =A0 =A0 >>
> =A0 =A0 >> http://www.lucidimagination.com
> =A0 =A0 >>
> =A0 =A0 >>
> =A0 =A0 >>
> =A0 =A0 >>
> =A0 =A0 >>
> =A0 =A0 --------------------------------------------------------------= -------
> =A0 =A0 >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> =A0 =A0 <mailto:java-dev-unsubscribe@lucene.apache.org>
> =A0 =A0 >> For additional commands, e-mail: > =A0 =A0 java-dev-he= lp@lucene.apache.org
> =A0 =A0 <mailto:java-dev-help@lucene.apache.org>
> =A0 =A0 >>
> =A0 =A0 >>
> =A0 =A0 >>
> =A0 =A0 >
> =A0 =A0 >
> =A0 =A0 --------------------------------------------------------------= -------
> =A0 =A0 > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> =A0 =A0 <mailto:java-dev-unsubscribe@lucene.apache.org>
> =A0 =A0 > For additional commands, e-mail: java-dev-help@lucene.apache.or= g
> =A0 =A0 <mailto:java-dev-help@lucene.apache.org>
> =A0 =A0 >
> =A0 =A0 >
>
>
> =A0 =A0 --
> =A0 =A0 - Mark
>
> =A0 =A0 = http://www.lucidimagination.com
>
>
>
>
> =A0 =A0 --------------------------------------------------------------= -------
> =A0 =A0 To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> =A0 =A0 <mailto:java-dev-unsubscribe@lucene.apache.org>
> =A0 =A0 For additional commands, e-mail: java-dev-help@lucene.apache.org=
> =A0 =A0 <mailto:java-dev-help@lucene.apache.org>
>
>


--
- Mark

http://www.lu= cidimagination.com




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


--0016364d1ccf9d68270471e4e903--