Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D90C76904 for ; Thu, 16 Jun 2011 17:00:09 +0000 (UTC) Received: (qmail 72230 invoked by uid 500); 16 Jun 2011 17:00:08 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 72137 invoked by uid 500); 16 Jun 2011 17:00:08 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 72130 invoked by uid 99); 16 Jun 2011 17:00:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 17:00:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of martijn.is.hier@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 17:00:01 +0000 Received: by vws7 with SMTP id 7so1986122vws.35 for ; Thu, 16 Jun 2011 09:59:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=zKkPhzChLM1iGdvgSiOmbhqKRIeHMlE5Z81lhkH+Fqs=; b=rDAQs0ThD2FxucaB2USgmuO5ZOEtOI4WQ0y01xcPMBzWNc8jdKcq8u007eKXY/vzrW AJBIKtkZFsf8zna+L0Xtb1W/gIt816n2wR9K+7yhRL4JtxDIW6cWbSVojwnHAPUYJ8eU q0YBZZPTw4G5y+e1AnurVitehWj8kwIoBSye4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=sm12IhsADxtpMEGxi9ZX/PcnR5S70tRTiNQTiv8pXEsXm8/hnZtzyPdYi7A8E8HBFk +xynGI3F2qttR/bW9lNYAGygOJN3BPEmvVkN2p9IUZol47G/pjVMTurXUWQv+EkwM4mj wZHnyxbfuenv50kSUwZjHVyccjneavDVtwXYA= Received: by 10.220.70.20 with SMTP id b20mr381158vcj.220.1308243580157; Thu, 16 Jun 2011 09:59:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.200.199 with HTTP; Thu, 16 Jun 2011 09:59:20 -0700 (PDT) In-Reply-To: References: <009101cc2a6d$0475e9a0$0d61bce0$@thetaphi.de> <4DF75C45.60209@yahoo.com> <001801cc2a96$7f20f8b0$7d62ea10$@thetaphi.de> From: Martijn v Groningen Date: Thu, 16 Jun 2011 18:59:20 +0200 Message-ID: Subject: Re: Indexing slower in trunk To: dev@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e64712cae0018504a5d731b0 X-Virus-Checked: Checked by ClamAV on apache.org --0016e64712cae0018504a5d731b0 Content-Type: text/plain; charset=UTF-8 @Uwe Solr does support expunge deletes: http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22 On 16 June 2011 18:05, Erick Erickson wrote: > OK, after more tests I'm pretty sure that my personal machine > that I'm testing on is just resource-constrained, leading to the > results I mentioned before. After all, I'm running my Solr > instance, the indexing program, etc on a Macbook > with 1 CPU and 2 cores. The indexing program is parsing the > XML. > > On a proper setup, where the indexing machine was separate > from the machine(s) feeding the index process I suspect this would > be a different story. Hmmmm, I may try that sometime too.... > > Best > Erick > > On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler wrote: > > For simple removing deletes, there is also IW.expungeDeletes(), which is > > less intensive! Not sure if solr support this, too, but as far as I know > > there is an issue open. > > > > Also please note: As soon as one segment is selected for merging (the > merge > > policy may also do this dependent on the number of deletes in a segment), > it > > will reclaim all deleted ressources - that's what merging does. So > expunging > > deletes once per week is a good idea, if your index consists of very old > and > > large segments that are rarely merged anymore and lots of documents are > > deleted from them. > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: uwe@thetaphi.de > > > > > >> -----Original Message----- > >> From: Erick Erickson [mailto:erickerickson@gmail.com] > >> Sent: Tuesday, June 14, 2011 3:19 PM > >> To: dev@lucene.apache.org > >> Subject: Re: Indexing slower in trunk > >> > >> Optimization used to have a very noticeable impact on search speed prior > > to > >> some index format changes from quite a while ago. > >> > >> At this point the effect is much less noticeable, but the thing optimize > > does > >> do is reclaim resources from deleted documents. If you have lots of > >> deletions, it's a good idea to periodically optimize, but in that case > > it's often > >> done pretty infrequently (once a > >> day/week/month) rather than as part of any ongoing indexing process. > >> > >> Best > >> Erick > >> > >> 2011/6/14 Yury Kats : > >> > On 6/14/2011 4:28 AM, Uwe Schindler wrote: > >> >> indexing and optimizing was only a > >> >> good idea pre Lucene-2.9, now it's mostly obsolete) > >> > > >> > Could you please elaborate on this? Is optimizing obsolete in general > >> > or after indexing new documents? Is it obsolete after deletions? And > >> > what it "mostly"? > >> > > >> > Thanks! > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For > >> > additional commands, e-mail: dev-help@lucene.apache.org > >> > > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For > additional > >> commands, e-mail: dev-help@lucene.apache.org > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > > For additional commands, e-mail: dev-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > -- Met vriendelijke groet, Martijn van Groningen --0016e64712cae0018504a5d731b0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable @Uwe
Solr does support expunge deletes:
htt= p://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commi= t.22

On 16 June 2011 18:05, Erick Erickson <erickerickson@gm= ail.com> wrote:
OK, after more tests I'm pretty sure that my personal machine
that I'm testing on is just resource-constrained, leading to the
results I mentioned before. After all, I'm running my Solr
instance, the indexing program, etc on a Macbook
with 1 CPU and 2 cores. The indexing program is parsing the
XML.

On a proper setup, where the indexing machine was separate
from the machine(s) feeding the index process I suspect this would
be a different story. Hmmmm, I may try that sometime too....

Best
Erick

On Tue, Jun 14, 2011 at 9:25 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> For simple removing deletes, there is also IW.expungeDeletes(), which = is
> less intensive! Not sure if solr support this, too, but as far as I kn= ow
> there is an issue open.
>
> Also please note: As soon as one segment is selected for merging (the = merge
> policy may also do this dependent on the number of deletes in a segmen= t), it
> will reclaim all deleted ressources - that's what merging does. So= expunging
> deletes once per week is a good idea, if your index consists of very o= ld and
> large segments that are rarely merged anymore and lots of documents ar= e
> deleted from them.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetap= hi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Tuesday, June 14, 2011 3:19 PM
>> To: dev@lucene.apache.org=
>> Subject: Re: Indexing slower in trunk
>>
>> Optimization used to have a very noticeable impact on search speed= prior
> to
>> some index format changes from quite a while ago.
>>
>> At this point the effect is much less noticeable, but the thing op= timize
> does
>> do is reclaim resources from deleted documents. If you have lots o= f
>> deletions, it's a good idea to periodically optimize, but in t= hat case
> it's often
>> done pretty infrequently (once a
>> day/week/month) rather than as part of any ongoing indexing proces= s.
>>
>> Best
>> Erick
>>
>> 2011/6/14 Yury Kats <yury= kats@yahoo.com>:
>> > On 6/14/2011 4:28 AM, Uwe Schindler wrote:
>> >> indexing and optimizing was only a
>> >> good idea pre Lucene-2.9, now it's mostly obsolete) >> >
>> > Could you please elaborate on this? Is optimizing obsolete in= general
>> > or after indexing new documents? Is it obsolete after deletio= ns? And
>> > what it "mostly"?
>> >
>> > Thanks!
>> >
>> > -------------------------------------------------------------= --------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For
>> > additional commands, e-mail: dev-help@lucene.apache.org
>> >
>> >
>>
>> ------------------------------------------------------------------= ---
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
>> commands, e-mail: de= v-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------<= br> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org




--
Met vriende= lijke groet,

Martijn van Groningen
--0016e64712cae0018504a5d731b0--