Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 1849 invoked from network); 10 Feb 2010 14:24:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Feb 2010 14:24:49 -0000 Received: (qmail 9089 invoked by uid 500); 10 Feb 2010 14:24:47 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 9026 invoked by uid 500); 10 Feb 2010 14:24:47 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 9016 invoked by uid 99); 10 Feb 2010 14:24:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Feb 2010 14:24:47 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rcmuir@gmail.com designates 209.85.216.204 as permitted sender) Received: from [209.85.216.204] (HELO mail-px0-f204.google.com) (209.85.216.204) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Feb 2010 14:24:37 +0000 Received: by pxi42 with SMTP id 42so19958pxi.5 for ; Wed, 10 Feb 2010 06:24:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=GJODLiIGZGleiaEWtp3IFp4wkAJa9WuxkxdWlXW61Hk=; b=XZD5nTkD558VWs3rkzPbyneLgyXgwTQlFVYpyHu3R0WLgmEnzOuutmNBT/OM11VIXd 4ojcmoxkcJFQrfMTgsa5R5ikxRereT19T9yyMzCBo+5oN/bNA6w+cFmyRERmpIb5+VFH YWodH+WHg2fivpu9+Li1GswI+aproWm0gZ/kg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=x7fr2fZYMC1Wp4ZWMDw4w/PxPNcceAWg3lSowevQVSiY+Ad4tmPx0II6UC4ody6Au0 4nqMUWPrKBog2YATtl96F914cCiEZX8fa9e1UBCi6uQpINlFAlx/GEcgH1gkX2Z72u7L 4CbcLXmV6baHYwLEJBE3DcKv0WkINT5i8d6EM= MIME-Version: 1.0 Received: by 10.114.163.19 with SMTP id l19mr166745wae.170.1265811857121; Wed, 10 Feb 2010 06:24:17 -0800 (PST) In-Reply-To: <3005.56566.qm@web113319.mail.gq1.yahoo.com> References: <8f0ad1f31002071959u391a5e3ctd3a3eabf2cbb58e4@mail.gmail.com> <3005.56566.qm@web113319.mail.gq1.yahoo.com> From: Robert Muir Date: Wed, 10 Feb 2010 09:23:57 -0500 Message-ID: <8f0ad1f31002100623h7516140p4c2e6f469a6d5794@mail.gmail.com> Subject: Re: TREC Data and Topic-Specific Index To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=00504502f58618a41f047f3fca52 --00504502f58618a41f047f3fca52 Content-Type: text/plain; charset=UTF-8 Hi, so you mean around 15% and 24% respectively? i think you could fairly say either of these is an improvement over your baseline of 0.141 what i mean by large difference, is while I think its safe to say that using either of these methods improves over your baseline, i am not sure you can conclude that either improvement is better than the other, you can apply various statistical tests to try to figure this out, but because you didn't participate in the pool with these runs, you would have to be careful about drawing conclusions as to which similarity is best, as there is some bias and error involved. On Wed, Feb 10, 2010 at 9:14 AM, Ivan Provalov wrote: > Robert, > > Thank you for your reply. What would be considered a large difference? We > started applying the Sweet Spot Similarity. It gives us an improvement of > 0.163-0.141=0.022 MAP so far. LnbLtcSimilarity gets us more improvement: > 0.175-0.141=0.034. > > Thanks, > > Ivan > > --- On Sun, 2/7/10, Robert Muir wrote: > > > From: Robert Muir > > Subject: Re: TREC Data and Topic-Specific Index > > To: java-user@lucene.apache.org > > Date: Sunday, February 7, 2010, 10:59 PM > > you should do (a), and pretend you > > know nothing about the relevance > > judgements up front. > > > > it is true you might make some change to your search engine > > and wonder, how > > is it fair that I am bringing back possibly relevant docs > > that were never > > judged (and thus scored implicitly as non-relevant)? i.e. > > the test > > collection is biased against you because you did not > > participate in the > > pooling process. > > > > if you are concerned about this, you should still use (a), > > but perhaps look > > at other measures such as bpref ( > > > http://comminfo.rutgers.edu/~muresan/IR/Docs/Articles/sigirBuckley2004.pdf > ). > > > > personally, I simply prefer to stick with MAP. And with all > > measures, > > whether you look at bpref or map, my advice is to only > > consider large > > differences only when evaluating some potential > > improvement! > > > > On Sun, Feb 7, 2010 at 6:49 PM, Ivan Provalov > > wrote: > > > > > Robert, > > > > > > We are using TREC-3 data and Ad Hoc topics > > 151-200. The relevance > > > judgments list contains 97,319 entries, of which > > 68,559 are unique document > > > ids. The TIPSTER collection which was used in > > TREC-3 is around 750,000 > > > documents. > > > > > > Should we (a) index the entire 750,000 document > > collection, or (b) the > > > document collection of the 68,559 unique documents > > listed in the qrels, or > > > (c) should we limit our index to each specific topic > > (about 2,000 docs) i.e. > > > to the documents listed for a particular topic in the > > qrels? > > > > > > Thanks, > > > > > > Ivan > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > -- > > Robert Muir > > rcmuir@gmail.com > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Robert Muir rcmuir@gmail.com --00504502f58618a41f047f3fca52--