Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 68788 invoked from network); 16 Nov 2010 05:56:21 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Nov 2010 05:56:21 -0000 Received: (qmail 99112 invoked by uid 500); 16 Nov 2010 05:56:51 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 98918 invoked by uid 500); 16 Nov 2010 05:56:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 98909 invoked by uid 99); 16 Nov 2010 05:56:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Nov 2010 05:56:49 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of goksron@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Nov 2010 05:56:42 +0000 Received: by qyk29 with SMTP id 29so347510qyk.14 for ; Mon, 15 Nov 2010 21:56:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding:x-antivirus :x-antivirus-status; bh=PtkQB9jaoKENwxQ6Jxxw2z+rcS9l0HXFfKhCBkavUXM=; b=BB9MlltMzp1CWa07LW/DkrfEcvErm01dPcvTefBjo3Qca9+itF+CQdHhZ8hZuUlLH5 4d+ixllb04uT4jtqKEm0pCytjqvU3LFHdMz63EEW9D85pRmfw2RdtzFCJCQbOKLswYc0 LGfW1PprYHr6RixlZVojhgxdQh4xYtHuAz624= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding:x-antivirus :x-antivirus-status; b=V8iyV7yy3DGJ9N1ARdQAOt2OD82GjCn2D7VPOpTsSwPbrlrd+4uHfvqU+MKvd/QqdJ siIgwi9GundaKwkMCEuvqqRfgWBe/hUAmccrc7dE1JsJgZJKxMsTaR7NVejTOWItkCkm KFDoE1vdfS+KdZJKNUoxjAXpNOy2YGpMqy7Ok= Received: by 10.229.220.78 with SMTP id hx14mr5910928qcb.148.1289886980974; Mon, 15 Nov 2010 21:56:20 -0800 (PST) Received: from [127.0.0.1] ([207.179.4.183]) by mx.google.com with ESMTPS id nb14sm547109qcb.24.2010.11.15.21.56.19 (version=SSLv3 cipher=RC4-MD5); Mon, 15 Nov 2010 21:56:20 -0800 (PST) Message-ID: <4CE21D07.4030009@gmail.com> Date: Mon, 15 Nov 2010 21:56:23 -0800 From: Lance Norskog User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.14) Gecko/20100930 SeaMonkey/2.0.9 MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: What is the best Analyzer and Parser for this type of question? References: <841536.42106.qm@web52906.mail.re2.yahoo.com> In-Reply-To: <841536.42106.qm@web52906.mail.re2.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: avast! (VPS 101115-1, 11/15/2010), Outbound message X-Antivirus-Status: Clean X-Virus-Checked: Checked by ClamAV on apache.org First, to understand what your query looks like, go to admin/analysis.jsp. It lets you see what happens to your queries when they go in. Then, do the query with debugQuery=true. This will add some complex junk to the end of the XML page that describes in painful detail exactly how each document was scored. After all that- you might have a problem with the PrnP etc. stuff getting chopped up in weird ways. I don't know how people handle this in chemistry/bio search. Lance Ahmet Arslan wrote: > >> Example of Question: >> - What is the role of PrnP in mad cow disease? >> > First thing is do not directly query questions. Manually formulate queries: > remove 'what' 'is' 'the' 'of' '?' etc. > > For example i would convert this question into: > > "mad cow"^5 "cow disease"^3 "mad cow disease"^15 "role PrnP"~5^2 "role mad cow disease"~45 mad^0.1 role^0.5 cow disease PrnP^10 > > >> I am running in 11.638 documents and the result is 10410 >> docs for this question (lowwwwww precision) >> > Use OR default operator, collect and evaluate top 1000 documents only. > > And instead of Porter you can try KStem. > http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi > > Try different length normalization described here. Also their Lucene query example (SpanNear) can inspire you. http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org