Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 45087 invoked from network); 13 Jul 2006 21:07:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Jul 2006 21:07:28 -0000 Received: (qmail 52125 invoked by uid 500); 13 Jul 2006 21:07:24 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 52097 invoked by uid 500); 13 Jul 2006 21:07:24 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 52086 invoked by uid 99); 13 Jul 2006 21:07:24 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Jul 2006 14:07:24 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of yseeley@gmail.com designates 64.233.166.182 as permitted sender) Received: from [64.233.166.182] (HELO py-out-1112.google.com) (64.233.166.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Jul 2006 14:07:23 -0700 Received: by py-out-1112.google.com with SMTP id c59so404129pyc for ; Thu, 13 Jul 2006 14:07:03 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=kV/b0xdYhXczwnfhWfM0HyIo5KJo+vGlTjWYrW9EyNgeEgp+GR+csTRwkh9uTNDHLCkkOYis1+KBuTF6jNL0HqhAQSfokEzLl3QKAsvqwZSlQ0R8uKcKIiM8b3vTzh1C6QwUmr8/7HGdyDUEOELYTofxf9h9isbUUetIE8j3NAw= Received: by 10.35.134.12 with SMTP id l12mr1004181pyn; Thu, 13 Jul 2006 14:07:03 -0700 (PDT) Received: by 10.35.129.12 with HTTP; Thu, 13 Jul 2006 14:07:03 -0700 (PDT) Message-ID: Date: Thu, 13 Jul 2006 17:07:03 -0400 From: "Yonik Seeley" To: java-dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided) In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <14645793.1152156330186.JavaMail.jira@brutus> <20060707130551.GA10773@fermat.math.technion.ac.il> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On 7/12/06, Ning Li wrote: > > If it can be done in a separate class, using public APIs (or at least > > with a minimum of protected access), without a loss in performance, > > then that's the way to go IMO. > > This is exactly what I'm asking. Can it be done using public APIs > without a loss in performance or functionality? The answer to that is the crux of the matter :-) Solr's implementation is here: http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/update/DirectUpdateHandler2.java?view=markup As I said, instead of keeping track of the maxSegment (equiv to the docid of the in-memory segments), Solr keeps track of the number of documents not to delete. So, a delete sets the count to 0, an overwriting add sets the count to "1", and a non-overwriting add increases the count. I'll do some speculation on how Solr would compare with NewIndexModifier: For a very small batch of updates to an existing index: - should be little or no difference... they both do the same amount of work. Building a complete index, without any deletes: - no difference Building a complete index, with deletes - there will be some differences... Currently, Solr only does the real deletes on a commit call (but this could be changed). That means that NewIndexModifier will be doing deletes more often (every maxBufferedDocs). The benefit to more frequent deletes when doing a complete index build is that some of them will be on a smaller index... deletes very early on in the process will be faster than those later on when the index is larger. The downside to more frequent deletes is that more IndexReaders are opened and closed. For a large batch of updates (deletes and adds) to an existing index: - probably Solr would be faster due to a single delete phase. For the default Lucene maxBufferedDocs, I would guess Solr's method would probably be faster in the majority of cases. As maxBufferedDocs increased, that advantage would lessen. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org