Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EADC7D34E for ; Tue, 13 Nov 2012 14:54:15 +0000 (UTC) Received: (qmail 11477 invoked by uid 500); 13 Nov 2012 14:54:14 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 11417 invoked by uid 500); 13 Nov 2012 14:54:14 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 11385 invoked by uid 99); 13 Nov 2012 14:54:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Nov 2012 14:54:13 +0000 Date: Tue, 13 Nov 2012 14:54:13 +0000 (UTC) From: "Simon Willnauer (JIRA)" To: dev@lucene.apache.org Message-ID: <1574492504.108096.1352818453711.JavaMail.jiratomcat@arcas> In-Reply-To: <1280041799.108004.1352817612270.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-4556?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4556: ------------------------------------ Attachment: LUCENE-4556.patch here is a patch ...scary=E2=84=A2 =20 > FuzzyTermsEnum creates tons of objects > -------------------------------------- > > Key: LUCENE-4556 > URL: https://issues.apache.org/jira/browse/LUCENE-4556 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search, modules/spellchecker > Affects Versions: 4.0 > Reporter: Simon Willnauer > Assignee: Simon Willnauer > Priority: Critical > Fix For: 4.1, 5.0 > > Attachments: LUCENE-4556.patch > > > I ran into this problem in production using the DirectSpellchecker. The n= umber of objects created by the spellchecker shoot through the roof very ve= ry quickly. We ran about 130 queries and ended up with > 2M transitions / s= tates. We spend 50% of the time in GC just because of transitions. Other pa= rts of the system behave just fine here. > I talked quickly to robert and gave a POC a shot providing a LevenshteinA= utomaton#toRunAutomaton(prefix, n) method to optimize this case and build a= array based strucuture converted into UTF-8 directly instead of going thro= ugh the object based APIs. This involved quite a bit of changes but they ar= e all package private at this point. I have a patch that still has a fair s= et of nocommits but its shows that its possible and IMO worth the trouble t= o make this really useable in production. All tests pass with the patch - i= ts a start.... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org