From java-dev-return-13246-apmail-lucene-java-dev-archive=lucene.apache.org@lucene.apache.org Mon Apr 03 05:32:47 2006 Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 91669 invoked from network); 3 Apr 2006 05:32:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 3 Apr 2006 05:32:47 -0000 Received: (qmail 7463 invoked by uid 500); 3 Apr 2006 05:32:40 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 7433 invoked by uid 500); 3 Apr 2006 05:32:40 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 7421 invoked by uid 99); 3 Apr 2006 05:32:40 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Apr 2006 22:32:40 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [195.121.6.187] (HELO hnexfe17.hetnet.nl) (195.121.6.187) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Apr 2006 22:32:39 -0700 Received: from [192.168.0.100] ([86.85.154.64]) by hnexfe17.hetnet.nl with Microsoft SMTPSVC(5.0.2195.6713); Mon, 3 Apr 2006 07:32:18 +0200 Mime-Version: 1.0 (Apple Message framework v746.3) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: karl wettin Subject: Re: Contextual suggestions Date: Mon, 3 Apr 2006 07:34:38 +0200 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.746.3) X-OriginalArrivalTime: 03 Apr 2006 05:32:18.0298 (UTC) FILETIME=[F91115A0:01C656DF] X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N 31 mar 2006 kl. 06.54 skrev karl wettin: > I've been working a bit with the spell checker. It does a pretty > good job when it comes to finding a smiple typo. > I was thinking it would be nice if I could turn "heros light and > magic" to "did you mean: heroes of might and magic?". > > My strategy is to combine Markov, A* and Levenstein. > Any comments on this? Questions? Nothing? Not even a go-go-go? I would really like to discuss it with someone before I spend too much time on it. This is what it is: a simple Markov chain is similar to ngrams, but on a word level rather than character level. A* is a classic gaming algorithm to find the cheapest path in a matrix. I assume you all know Levenstein from FuzzyQuery. I have been sleeping on this a bit and think it might not work on a big corpus. One probably have to limit it to one Markov chain per context of some kind. Say category or so. Perhaps there is some other forum more focused on text analysis you would like to recommend me? --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org