Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 20198 invoked from network); 25 Apr 2009 09:48:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Apr 2009 09:48:43 -0000 Received: (qmail 4514 invoked by uid 500); 25 Apr 2009 09:48:41 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 4421 invoked by uid 500); 25 Apr 2009 09:48:41 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 4411 invoked by uid 99); 25 Apr 2009 09:48:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Apr 2009 09:48:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of radha84@gmail.com designates 74.125.92.27 as permitted sender) Received: from [74.125.92.27] (HELO qw-out-2122.google.com) (74.125.92.27) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Apr 2009 09:48:33 +0000 Received: by qw-out-2122.google.com with SMTP id 5so1274640qwd.53 for ; Sat, 25 Apr 2009 02:48:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type:content-transfer-encoding; bh=nIhaTnA5UBNKa20kNzK7Al2B1fdAzSqCoUvlo1m8Zdk=; b=fGQag3z2Mv4jgmbAoEZfvpcF8HeKR6ccXCfXnSWcqEVzAQUSuVEMKLmlJdz0AgNjBy jOfq8kjurRHaLSf3LhgRoq6sjdbNKTdl5aA8rslncrsLPbC0ZQw6kVRWFEySDTq6dGFb a68QnSvG/lwZjf7+O7mmpMTob9M3Sfz7MoIyU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type :content-transfer-encoding; b=i0eLdoxeyFOaTtgWOvxAde152itnEEBUjKiXiCVY2qLSCQeRYCr9loTAXvAZFYICGl f1ImGKn3m+YAIUV7jSh9vGwCpfJ/qEkUrcZf1ZpQJezoRz90qx3xH96TKZ03CjtkeyL6 AAwPFqfZrLDE56H1BX6vrrQq1KZrK/GcW8KBE= MIME-Version: 1.0 Received: by 10.220.95.14 with SMTP id b14mr6605779vcn.35.1240652892413; Sat, 25 Apr 2009 02:48:12 -0700 (PDT) Date: Sat, 25 Apr 2009 15:18:12 +0530 Message-ID: <45e5f8bb0904250248r7fd8af0ahb164e506ac1bfc39@mail.gmail.com> Subject: Proximity and Percentage match search in Lucene From: Radha Sreedharan To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org What I need is the following : > If my document field is ( ab,bc,cd,ef) and Search tokens are (ab,bc,cd). > > Given the following : > I should get a hit even if all of the search tokens aren't present > If the tokens are found they should be found within a distance x of > each other ( proximity > search) >> >> I need the percentage match of the search tokens with the document field. >> >> Currently this is my query : >> 1) I form all possible permutation of the search tokens >> 2) do a spanNearQuery of each permutation >> 3) Do a DisjunctionMaxQuery on the spannearqueries. >> >> This is how I compute % match : >> % match = ( Score by running the query on the document field ) / >> ( score by running the query on a document field created out of search >> tokens ) >> >> The numerator gives me the actual score with the search tokens run on the >> field. >> Denominator gives me the best possible or maximum possible score with >> the current search > tokens >> >> For this example << If my document field is ( ab,bc,cd,ef) and Search >> tokens are > (ab,bc,cd).>> I expect a % match of around 90%. >> >> However I get a match of only around 50% without a boost. Using a boost >> infact reduces > my percentage. >> >> I even overrode the queryNorm method to return a one, still the >> percentage did not increase. > * > Is there any way of implementing this using the current set of > implementation classes in Lucene and not making complex changes to the > structure by itself. > ( which is what i gather has to be done from the previous replies) > > Can anyone suggest an alternative way of implementing this requirement > using the existing bunch of classes in Lucene and not necessarily > using the ones I have used* Regards, Radha --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org