Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 29521 invoked from network); 11 Oct 2004 17:21:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 11 Oct 2004 17:21:59 -0000 Received: (qmail 33423 invoked by uid 500); 11 Oct 2004 17:21:56 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 33247 invoked by uid 500); 11 Oct 2004 17:21:55 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 33234 invoked by uid 500); 11 Oct 2004 17:21:55 -0000 Received: (qmail 33231 invoked by uid 99); 11 Oct 2004 17:21:55 -0000 X-ASF-Spam-Status: No, hits=-10.0 required=10.0 tests=ALL_TRUSTED,NO_REAL_NAME X-Spam-Check-By: apache.org Received: from [209.237.227.194] (HELO minotaur.apache.org) (209.237.227.194) by apache.org (qpsmtpd/0.28) with SMTP; Mon, 11 Oct 2004 10:21:53 -0700 Received: (qmail 29482 invoked from network); 11 Oct 2004 17:21:52 -0000 Received: from unknown (HELO minotaur.apache.org) (127.0.0.1) by 127.0.0.1 with SMTP; 11 Oct 2004 17:21:52 -0000 Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: lucene-cvs@jakarta.apache.org To: lucene-cvs@jakarta.apache.org Subject: =?iso-8859-1?q?=5BJakarta_Lucene_Wiki=5D_New=3A__SpellChecker?= Date: Mon, 11 Oct 2004 17:21:52 -0000 Message-ID: <20041011172152.29476.37426@minotaur.apache.org> X-Spam-Rating: 127.0.0.1 1.6.2 0/1000/N X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Date: 2004-10-11T10:21:52 Editor: NicolasMaisonneuve Wiki: Jakarta Lucene Wiki Page: SpellChecker URL: http://wiki.apache.org/jakarta-lucene/SpellChecker no comment New Page: SpellChecker a Spell Checker allow to suggest a list of words close to a misspelled word= . This implementation use the n-gram technic and the levensthein distance. = A Index (the dictionary) with all the possible words (a lucene index) must= be created. The structure of this index is (for a 3-4 gram): word: gram3: gram4: 3start: 4start: 3end: 4end: transposition: = it's independant of the user index. So we can add words becoming to several fields of several index for example or, why not, to a file with a list of w= ords. source: SpellChecker spellChecker=3D new SpellChecker(); The suggestSimilar method return a list of suggests word sorted by the Levenshtein distance and optionaly to the popularity of the word for a spec= ific field in a user index. More of that, this list can be restricted only to wo= rds present in a specific field of a user index. = See the test case code for example = download file to [http://issues.apache.org/bugzilla/show_bug.cgi?id=3D31617] --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org