Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5112792C6 for ; Tue, 28 Feb 2012 14:36:26 +0000 (UTC) Received: (qmail 83940 invoked by uid 500); 28 Feb 2012 14:36:24 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 83847 invoked by uid 500); 28 Feb 2012 14:36:24 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 83822 invoked by uid 99); 28 Feb 2012 14:36:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2012 14:36:23 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.176] (HELO mail-we0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2012 14:36:18 +0000 Received: by werc1 with SMTP id c1so2609178wer.35 for ; Tue, 28 Feb 2012 06:35:57 -0800 (PST) Received-SPF: pass (google.com: domain of lucene@mikemccandless.com designates 10.180.95.105 as permitted sender) client-ip=10.180.95.105; Authentication-Results: mr.google.com; spf=pass (google.com: domain of lucene@mikemccandless.com designates 10.180.95.105 as permitted sender) smtp.mail=lucene@mikemccandless.com Received: from mr.google.com ([10.180.95.105]) by 10.180.95.105 with SMTP id dj9mr39205396wib.18.1330439757255 (num_hops = 1); Tue, 28 Feb 2012 06:35:57 -0800 (PST) Received: by 10.180.95.105 with SMTP id dj9mr31060334wib.18.1330439757151; Tue, 28 Feb 2012 06:35:57 -0800 (PST) MIME-Version: 1.0 Received: by 10.216.167.196 with HTTP; Tue, 28 Feb 2012 06:35:37 -0800 (PST) In-Reply-To: References: From: Michael McCandless Date: Tue, 28 Feb 2012 09:35:37 -0500 Message-ID: Subject: Re: Building FST-like automaton queries To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Gm-Message-State: ALoCoQkRhWk91IF0ZzlycgdFlfYG0Q6l9K+8pW9sIedor+F6YMJmMhOsyue+0IfOeyeFc8rXbCCn X-Virus-Checked: Checked by ClamAV on apache.org On Tue, Feb 28, 2012 at 8:42 AM, Alan Woodward wrote: > > On 28 Feb 2012, at 13:31, Michael McCandless wrote: > >> Neat :) =A0It's like a FuzzyQuery w/ a custom (binary?) cost matrix for >> the insert/delete/transposition changes... >> >> Is the number of edits smallish? =A0Ie you're not concerned about >> combinatoric explosion of step 1? > > We're only allowing expansions within an edit distance of 1, which should= keep the numbers of terms down. Ahh, ok. So even if the term has two occurrences of cl, only one of them is allowed to substitute d? >> For steps 2 and 3 you shouldn't use FST at all. =A0Instead, for 2) use >> BasicAutomata.makeString(String) on each of your expanded terms, then >> BasicOperations.union on all of those automata to make a single >> automaton accepting all your expanded terms, then likely call >> .determinize() on the resulting automaton (maybe also .minimize() but >> I think that may not help). =A0Then pass that automaton to AQ. > > Excellent, thanks for your help. =A0I'll give that a go. You might also try the DaciukMihovAutomatonBuilder class (it's in lucene/test-framework): it builds a minimal deterministic automaton from sorted terms, very quickly... you'd just have to pre-sort your terms. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org