From dev-return-84524-apmail-lucene-dev-archive=lucene.apache.org@lucene.apache.org Thu Dec 1 21:47:05 2011 Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C65C74C8 for ; Thu, 1 Dec 2011 21:47:05 +0000 (UTC) Received: (qmail 36273 invoked by uid 500); 1 Dec 2011 21:47:03 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 36216 invoked by uid 500); 1 Dec 2011 21:47:03 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 36209 invoked by uid 99); 1 Dec 2011 21:47:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 21:47:03 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 21:47:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 533B6B3C47 for ; Thu, 1 Dec 2011 21:46:40 +0000 (UTC) Date: Thu, 1 Dec 2011 21:46:40 +0000 (UTC) From: "Dawid Weiss (Resolved) (JIRA)" To: dev@lucene.apache.org Message-ID: <1964933387.32794.1322776000342.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1200860193.17932.1320943011589.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Resolved] (SOLR-2888) FSTSuggester refactoring: utf8 storage, external sorts (OOM prevention), code cleanups MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved SOLR-2888. ------------------------------- Resolution: Fixed In trunk. > FSTSuggester refactoring: utf8 storage, external sorts (OOM prevention), code cleanups > -------------------------------------------------------------------------------------- > > Key: SOLR-2888 > URL: https://issues.apache.org/jira/browse/SOLR-2888 > Project: Solr > Issue Type: Improvement > Components: spellchecker > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Fix For: 4.0 > > Attachments: SOLR-2888.patch, SOLR-2888.patch, SOLR-2888.patch > > > This issue incorporates several problems: > - utf16 was used previously to store and lookup terms, now it is utf8 > - the construction would OOM with large number of terms because of the need to sort entries. Sorter APIs have been added and an implementation of external (on-disk) sorting is also added (Robert Muir). > - the FSTLookup class has been split and refactored into FSTCompletion and FSTCompletionBuilder, FSTCompletionLookup remains a facade connecting all the pieces and implements Lookup interface. For large inputs use FSTCompletionBuilder directly (and pre-bucket your input weights). > - Automatic bucketing in FSTCompletionLookup has been changed from linear min/max discretization into dividing into ranges after all values have been sorted. This empirically handles all potential distributions quite well. If somebody needs something very specific, use FSTCompletionBuilder directly (providing buckets), construct the automaton and then load it with FSTCompletionLookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org