Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 33834 invoked from network); 11 Feb 2010 15:48:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Feb 2010 15:48:20 -0000 Received: (qmail 27381 invoked by uid 500); 11 Feb 2010 15:48:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 27300 invoked by uid 500); 11 Feb 2010 15:48:17 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 27290 invoked by uid 99); 11 Feb 2010 15:48:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Feb 2010 15:48:17 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.136.167.19] (HELO web113319.mail.gq1.yahoo.com) (98.136.167.19) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 11 Feb 2010 15:48:06 +0000 Received: (qmail 85673 invoked by uid 60001); 11 Feb 2010 15:47:44 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1265903263; bh=79P6ovG+Tr18IhZPHtK6jBCqEElQvtHj+Jgt4QP4SQQ=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=jgvHbkPHIaNvL75i15XZghKAgU89NEBz8NKRo7WpNsXdDdBYkoak2lgn8nAt7BcU8JDjIf2KcUobLmnWj5RDQ7pJTOAJBCd0lWD8zp40DaTV2rRyIXdw/Fw8Wd4ZUInnBhUPY/e1pyyubvrENttq81MuYQ8RlfMYjiNGpZIItj8= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=jENomiQzZydDD9y84CB9qsq+rtrB265MtQgKa166eJvNCl+w8Ck6usABR0eVCgrC7dwxe3uG6xdbLp2cmwEo5jSwrdA4pbFjGMuGCcZwcZgDN1ZXj+0DvjUuh5qDEDmb0Q+enKypwTYxXLdSXIvW6kx0fkcdNkBq/mJKSP4gVKw=; Message-ID: <984233.85251.qm@web113319.mail.gq1.yahoo.com> X-YMail-OSG: KuK1znkVM1l.lMaM.TzpCmsCGblsdnTGgU5Sp3ZuWljRkwfKjKt09GTpz1fS1iQq0xlgslBsE2XwsB_zgK3PZtB0glgcMcqIMMCqHRNHSltoxGe1cAQ4X49rsNmES1djI1e6OHluNX9fAgEh.9NDnd3NmNcJsIWFDeJzZEFN8hQYHGmmaayCjssjSgf9.oomOPI9.NcIpM1yOj8y8273JLQ42FiRNzNCcxVy88xGvJSNRx_ugfYBKR_ZoXPUNIoMi0mb6Vm_5ACqHvXoxzlWuIv6lYY8NnGad3BTpj1GjYEt7O6yLPEJeRKsnUeuk21My1DGD6EFLPGVqA86TIbE71dOjD2.a08FJdv9xO._.Xo2CYbzJ75nJ1ATUuhx0yh.FhaIhXSX.ZxYLiLcE63EuepfjpSm1IdjgMXXTX755hXbvC2H.gYDRWDs42F58JGkZ7fZCEPXqpg6w7E_hF3bUXAmTeXAjX9gdA5pc71w Received: from [140.244.128.12] by web113319.mail.gq1.yahoo.com via HTTP; Thu, 11 Feb 2010 07:47:43 PST X-Mailer: YahooMailClassic/9.1.10 YahooMailWebService/0.8.100.260964 Date: Thu, 11 Feb 2010 07:47:43 -0800 (PST) From: Ivan Provalov Subject: Re: TREC Data and Topic-Specific Index To: java-user@lucene.apache.org In-Reply-To: <8f0ad1f31002100623h7516140p4c2e6f469a6d5794@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Thank you, Robert.=0A=0A--- On Wed, 2/10/10, Robert Muir = wrote:=0A=0A> From: Robert Muir =0A> Subject: Re: TREC D= ata and Topic-Specific Index=0A> To: java-user@lucene.apache.org=0A> Date: = Wednesday, February 10, 2010, 9:23 AM=0A> Hi, so you mean around 15% and 24= %=0A> respectively? i think you could fairly=0A> say either of these is an = improvement over your baseline of=0A> 0.141=0A> =0A> what i mean by large d= ifference, is while I think its safe=0A> to say that using=0A> either of th= ese methods improves over your baseline, i am=0A> not sure you can=0A> conc= lude that either improvement is better than the other,=0A> =0A> you can app= ly various statistical tests to try to figure=0A> this out, but=0A> because= you didn't participate in the pool with these runs,=0A> you would have=0A>= to be careful about drawing conclusions as to which=0A> similarity is best= , as=0A> there is some bias and error involved.=0A> =0A> On Wed, Feb 10, 20= 10 at 9:14 AM, Ivan Provalov =0A> wrote:=0A> =0A> > Rob= ert,=0A> >=0A> > Thank you for your reply.=A0 What would be=0A> considered = a large difference?=A0 We=0A> > started applying the Sweet Spot Similarity.= =A0 It=0A> gives us an improvement of=0A> > 0.163-0.141=3D0.022 MAP so far.= =A0 LnbLtcSimilarity=0A> gets us more improvement:=0A> > 0.175-0.141=3D0.03= 4.=0A> >=0A> > Thanks,=0A> >=0A> > Ivan=0A> >=0A> > --- On Sun, 2/7/10, Rob= ert Muir =0A> wrote:=0A> >=0A> > > From: Robert Muir =0A> > > Subject: Re: TREC Data and Topic-Specific Index=0A> = > > To: java-user@lucene.apache.org=0A> > > Date: Sunday, February 7, 2010,= 10:59 PM=0A> > > you should do (a), and pretend you=0A> > > know nothing a= bout the relevance=0A> > > judgements up front.=0A> > >=0A> > > it is true = you might make some change to your=0A> search engine=0A> > > and wonder, ho= w=0A> > > is it fair that I am bringing back possibly=0A> relevant docs=0A>= > > that were never=0A> > > judged (and thus scored implicitly as=0A> non-= relevant)? i.e.=0A> > > the test=0A> > > collection is biased against you b= ecause you did=0A> not=0A> > > participate in the=0A> > > pooling process.= =0A> > >=0A> > > if you are concerned about this, you should still=0A> use = (a),=0A> > > but perhaps look=0A> > > at other measures such as bpref (=0A>= > >=0A> > http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirBuckl= ey2004.pdf=0A> > ).=0A> > >=0A> > > personally, I simply prefer to stick= with MAP.=0A> And with all=0A> > > measures,=0A> > > whether you look at b= pref or map, my advice is to=0A> only=0A> > > consider large=0A> > > differ= ences only when evaluating some potential=0A> > > improvement!=0A> > >=0A> = > > On Sun, Feb 7, 2010 at 6:49 PM, Ivan Provalov=0A> = =0A> > > wrote:=0A> > >=0A> > > > Robert,=0A> > > >=0A> > > > We are using = TREC-3 data and Ad Hoc topics=0A> > > 151-200.=A0 The relevance=0A> > > > j= udgments list contains 97,319 entries, of=0A> which=0A> > > 68,559 are uniq= ue document=0A> > > > ids.=A0 The TIPSTER collection which was=0A> used in= =0A> > > TREC-3 is around 750,000=0A> > > > documents.=0A> > > >=0A> > > > = Should we (a) index the entire 750,000=0A> document=0A> > > collection, or = (b) the=0A> > > > document collection of the 68,559 unique=0A> documents=0A= > > > listed in the qrels, or=0A> > > > (c) should we limit our index to ea= ch=0A> specific topic=0A> > > (about 2,000 docs) i.e.=0A> > > > to the docu= ments listed for a particular=0A> topic in the=0A> > > qrels?=0A> > > >=0A>= > > > Thanks,=0A> > > >=0A> > > > Ivan=0A> > > >=0A> > > >=0A> > > >=0A> >= > >=0A> > > >=0A> > >=0A> ------------------------------------------------= ---------------------=0A> > > > To unsubscribe, e-mail: java-user-unsubscri= be@lucene.apache.org=0A> > > > For additional commands, e-mail: java-user-h= elp@lucene.apache.org=0A> > > >=0A> > > >=0A> > >=0A> > >=0A> > > --=0A> > = > Robert Muir=0A> > > rcmuir@gmail.com=0A> > >=0A> >=0A> >=0A> >=0A> >=0A> = >=0A> ---------------------------------------------------------------------= =0A> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org=0A> = > For additional commands, e-mail: java-user-help@lucene.apache.org=0A> >= =0A> >=0A> =0A> =0A> -- =0A> Robert Muir=0A> rcmuir@gmail.com=0A> =0A=0A=0A= --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org