Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 80591 invoked from network); 26 Nov 2010 14:28:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Nov 2010 14:28:18 -0000 Received: (qmail 42197 invoked by uid 500); 26 Nov 2010 14:28:17 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 41870 invoked by uid 500); 26 Nov 2010 14:28:16 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 41853 invoked by uid 99); 26 Nov 2010 14:28:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 14:28:15 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yseeley@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Nov 2010 14:28:11 +0000 Received: by fxm2 with SMTP id 2so1690845fxm.35 for ; Fri, 26 Nov 2010 06:27:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:reply-to:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=BtG2TMVsepj8m7InSBD3oeIfRs2YLTnhd/H69AZnrWg=; b=Fn+e1GMUftip6qTYli4usVsq1swmvGdIZwuR4CeR9DVUt2efj9pkKEXdxPcwNndwTh uNuJlQSCQnl9mJdOMR6TK1SAtPCo4TuCFzoLwPpCVCrzlS52CAS9f2QKeiWVn+H2tS3s cDa0wyP1SD2Nc2JkpQlUdJ/aEnMtVDnS6+ARc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:reply-to:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=uke/MZB49Fj8wq6t7HPJlakVQe383pVGBsxNoPUXEOpD2Hp7nTrvabClrYHLWYV07I 8YVZaQ5w/vTbcLYv9Tk8mjPrvhSyFTYsjlYDmYXfl7iZwW3+oPg0/Gm1ufxbCpCwtQFh 5prAoV0XZjqANxq/2aGEkOKcNt+5QcHwSX0mA= MIME-Version: 1.0 Received: by 10.223.79.4 with SMTP id n4mr2154571fak.69.1290781669718; Fri, 26 Nov 2010 06:27:49 -0800 (PST) Sender: yseeley@gmail.com Reply-To: yonik@lucidimagination.com Received: by 10.223.74.202 with HTTP; Fri, 26 Nov 2010 06:27:49 -0800 (PST) In-Reply-To: <011001cb8a6d$900de850$b029b8f0$@thetaphi.de> References: <00e401cb8a60$f390d5c0$dab28140$@thetaphi.de> <010601cb8a69$1a2e1d20$4e8a5760$@thetaphi.de> <011001cb8a6d$900de850$b029b8f0$@thetaphi.de> Date: Fri, 26 Nov 2010 09:27:49 -0500 X-Google-Sender-Auth: s-abi_YcWXiwVtz3L34cNVCl9sw Message-ID: Subject: Re: best practice: 1.4 billions documents From: Yonik Seeley To: java-user@lucene.apache.org, Uwe Schindler Content-Type: text/plain; charset=ISO-8859-1 On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote: > (Fuzzy scores on > MultiSearcher and Solr are totally wrong because each shard uses another > rewritten query). Hmmm, really? I thought that fuzzy scoring should just rely on edit distance? Oh wait, I think I see - it's because we can use a hard cutoff for the number of expansions rather than an edit distance cutoff. If we used the latter, everything should be fine? The fuzzy issue I would classify as "working as designed". Either that, or classify FuzzyQuery as broken. A cuttoff based on number of terms will yield strange results even on a single index. Consider this scenario: it's possible to add more docs to a single index and have the same fuzzy query return fewer docs than it did before! -Yonik http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org