Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 95471 invoked from network); 6 Oct 2009 17:53:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Oct 2009 17:53:21 -0000 Received: (qmail 90369 invoked by uid 500); 6 Oct 2009 17:53:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 90318 invoked by uid 500); 6 Oct 2009 17:53:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 90308 invoked by uid 99); 6 Oct 2009 17:53:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2009 17:53:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of karl.wettin@gmail.com designates 209.85.219.214 as permitted sender) Received: from [209.85.219.214] (HELO mail-ew0-f214.google.com) (209.85.219.214) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2009 17:53:07 +0000 Received: by ewy10 with SMTP id 10so4465883ewy.9 for ; Tue, 06 Oct 2009 10:51:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=DKyFW/u/jA1zNn7ttDbKg1QNfOAF5QOhRDb0n5LYiYk=; b=aDGG+tAn7u32ux1OPU2Ead9KDA21zeeqgu+6WaBG1okc3ToXRFBKU7vGbBCZ+HpzFV zBDGHNooGmr0huLdG5MhKi68G8vfybhOtkugxGBlG1ziAoOza0HyAhIHf3gWcWwmZukq eKPOataRyOTTdGCjZ5LHIrfZsNpO+fTOZISVw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=URVnFxotIsxi1kgQ9/TDUcY+1nm+ZHxiRoU/lCRnhmVVmh4S8Pkf3Y6Oqx5CafrmsH 28NklRQVuAk5T1M8zKxl7qZjSI9sFckhEqVA+CF6AIbfCCNPbv6V1tCijA8pXRps2kmL OE767aGSsM4bpQDtvLUNOnc4qeku9M/RBxgOE= Received: by 10.210.95.26 with SMTP id s26mr5337464ebb.7.1254851507432; Tue, 06 Oct 2009 10:51:47 -0700 (PDT) Received: from ?192.168.1.201? (c-8f8170d5.029-18-6d6c6d2.cust.bredbandsbolaget.se [213.112.129.143]) by mx.google.com with ESMTPS id 7sm33634eyg.43.2009.10.06.10.51.45 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 06 Oct 2009 10:51:46 -0700 (PDT) Message-Id: From: Karl Wettin To: java-user@lucene.apache.org In-Reply-To: <20091006165406.GB25923@spotter-dclnx> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Re:InstantiatedIndex questions Date: Tue, 6 Oct 2009 19:51:44 +0200 References: <20091006165406.GB25923@spotter-dclnx> X-Mailer: Apple Mail (2.936) X-Virus-Checked: Checked by ClamAV on apache.org 6 okt 2009 kl. 18.54 skrev David Causse: David, your timing couldn't be better. Just the other day I proposed that we deprecate InstantiatedIndexWriter. The sum of the reasons to this is that I'm a bit lazy. Your mail makes me reconsider. https://issues.apache.org/jira/browse/LUCENE-1948 > On the index time InstantiatedIndex is behind RAMDirectory, but the > time Would you mind benchmarking some for me using your corpora? The issue suggests that people use the InstantiatedIndex(IndexReader) constructor to create the index rather than using InstantiatedIndexWriter. Is it way slower for you to produce the index using RAMDirectory/IndexWriter and pass an IndexReader to InstantiatedIndex? This is what the package level javadocs says about InstantiatedIndexWriter: "Hardly any effort has been put in to optimizing the InstantiatedIndexWriter, only minimizing the amount of time needed to write-lock the index has been considered." I'm sure there are ways to speed it up, I just never managed to find the time to look in to it. I never really used IIW. It might be worth mentioning that when InstantiatedIndex#commit returns it has yeilded an optimized "single segment" index. This is not quite how a Directory/IndexWriter acts. > gained over queries make it better (for what I see it can be 2 times > faster). > > InstantiatedIndex will be our default volatile mini index store for > our > next production release. Very cool!! > Whe should have other needs of this index but the lack of addIndexes > support make it impossible for us to use it in other situations. So we > continue to use RAMDirectory in such situations. Have you considered using multiple InstantiatedIndex and a MultiReader? That would pretty much be the same thing, just that the store wouldn't be quite as optimized. It would definitly use more RAM than if it was the same index. You could of course also pass this MultiReader to a new InstantiatedIndex. I have no real clue about the difference in speed and RAM consuption between these solutions so you should benchmark all solutions. > Do you think we could reach RAMDirectory index time by tweaking some > initialCap > stuff inside java.util.Collections you use? Maybe. But I think it would be a relatively small gain. But don't take my words for granted, benchmark it. Using the InstantiatedIndex(IndexReader) constructor will create rather optimal size of the collections. As for InstantiatedIndexWriter I think it's pretty much only the transient collections in #commit that will help you, my guess is that you should expemient with the dirtyTerms and termsByText attributes. Count the number of terms in your complete index and see how much it speeds thing up by creating the collections with this size from the start. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org