Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5D7972E2 for ; Thu, 1 Dec 2011 10:56:05 +0000 (UTC) Received: (qmail 79821 invoked by uid 500); 1 Dec 2011 10:56:04 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 79748 invoked by uid 500); 1 Dec 2011 10:56:04 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 79741 invoked by uid 99); 1 Dec 2011 10:56:04 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 10:56:04 +0000 X-ASF-Spam-Status: No, hits=-2001.2 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 10:56:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C714AAC726 for ; Thu, 1 Dec 2011 10:55:40 +0000 (UTC) Date: Thu, 1 Dec 2011 10:55:40 +0000 (UTC) From: "Dawid Weiss (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <1995997237.30430.1322736940816.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1200860193.17932.1320943011589.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (SOLR-2888) FSTSuggester refactoring: utf8 storage, external sorts (OOM prevention), code cleanups MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13160806#comment-13160806 ] Dawid Weiss commented on SOLR-2888: ----------------------------------- What do you mean by "complementary to itself"? As for closing, sure I can propoagate up the stack. > FSTSuggester refactoring: utf8 storage, external sorts (OOM prevention), code cleanups > -------------------------------------------------------------------------------------- > > Key: SOLR-2888 > URL: https://issues.apache.org/jira/browse/SOLR-2888 > Project: Solr > Issue Type: Improvement > Components: spellchecker > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Fix For: 4.0 > > Attachments: SOLR-2888.patch, SOLR-2888.patch > > > This issue incorporates several problems: > - utf16 was used previously to store and lookup terms, now it is utf8 > - the construction would OOM with large number of terms because of the need to sort entries. Sorter APIs have been added and an implementation of external (on-disk) sorting is also added (Robert Muir). > - the FSTLookup class has been split and refactored into FSTCompletion and FSTCompletionBuilder, FSTCompletionLookup remains a facade connecting all the pieces and implements Lookup interface. For large inputs use FSTCompletionBuilder directly (and pre-bucket your input weights). > - Automatic bucketing in FSTCompletionLookup has been changed from linear min/max discretization into dividing into ranges after all values have been sorted. This empirically handles all potential distributions quite well. If somebody needs something very specific, use FSTCompletionBuilder directly (providing buckets), construct the automaton and then load it with FSTCompletionLookup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org