Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 91888 invoked from network); 14 Feb 2006 21:43:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Feb 2006 21:43:19 -0000 Received: (qmail 77629 invoked by uid 500); 14 Feb 2006 21:43:16 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 77595 invoked by uid 500); 14 Feb 2006 21:43:15 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 77584 invoked by uid 99); 14 Feb 2006 21:43:15 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2006 13:43:15 -0800 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [68.142.206.44] (HELO web32814.mail.mud.yahoo.com) (68.142.206.44) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 14 Feb 2006 13:43:14 -0800 Received: (qmail 71281 invoked by uid 60001); 14 Feb 2006 21:42:53 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=0B6kENevIOTQK+LxqE/TeZEmw5cjxWrgccgzXDGZPzg35j49rbcaYw6+xKW5zJDSHh92HgRGwoA13Vw9gvzRvYZ28Lrqt3jslB/TZC+88fUZsj6J5IJ+7HdUBrBxfCabeE5sDgg22J51/vmNOCCNHVqjkgdve2STLmvfBvXw5wI= ; Message-ID: <20060214214253.71279.qmail@web32814.mail.mud.yahoo.com> Received: from [207.171.180.101] by web32814.mail.mud.yahoo.com via HTTP; Tue, 14 Feb 2006 13:42:53 PST Date: Tue, 14 Feb 2006 13:42:53 -0800 (PST) From: Tatu Saloranta Subject: Field constructor, avoiding String.intern() To: java-dev@lucene.apache.org MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N After profiling in-memory indexing, I noticed that calls to String.intern() showed up surprisingly high; especially the one from Field() constructor. This is understandable due to overhead String.intern() has (being native and synchronized method; overhead incurred even if String is already interned), and the fact this essentially gets called once per document+field combination. Now, it would be quite easy to improve things a bit (in theory), such that most intern() calls could be avoid, transparent to the calling app; for example, for each IndexWriter() one could use a simple HashMap() for caching interned Strings. This approach is more than twice as fast as directly calling intern(). One could also use per-thread cache, or global one; all of which would probably be faster. However, Field constructor hard-codes call to intern(), so it would be necessary to add a new constructor that indicates that field name is known to be interned. And there would also need to be a way to invoke the new optional functionality. Has anyone tried this approach to see if speedup is worth the hassle (in my case it'd probably be something like 2 - 3%, assuming profiler's 5% for intern() is accurate)? -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org