Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 50400 invoked from network); 10 Aug 2009 14:48:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Aug 2009 14:48:35 -0000 Received: (qmail 94114 invoked by uid 500); 10 Aug 2009 14:48:41 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 94031 invoked by uid 500); 10 Aug 2009 14:48:41 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 94023 invoked by uid 99); 10 Aug 2009 14:48:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Aug 2009 14:48:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of markrmiller@gmail.com designates 209.85.210.173 as permitted sender) Received: from [209.85.210.173] (HELO mail-yx0-f173.google.com) (209.85.210.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Aug 2009 14:48:32 +0000 Received: by yxe3 with SMTP id 3so4366648yxe.29 for ; Mon, 10 Aug 2009 07:48:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=rFQwM9Lyq4a7ELyIhYfYaphl9zuqX+zHveCcPDzhQAU=; b=suZCocjUA1iCGyvBk7aHddiweRs34HJBEUGRd/jCgtSif8kUjJaVKxRfjVGf4htCbm uj7Xn9CikW74IjiMQjUtZsi0aNGlmpbLv4DU9tJ8ziXKJU4nrVY/FM05E0ylOzhzuLR/ Aw42+3RYLi8Y6lJ2l3NY9E4Wba2Npt8rMypNA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=q2Ba/Dv00b5mApWMTaBOlP89hR+VcDkuHT+MNeOTe5oYbeqqW29BSys5cysRRB5e9W 2i6TSYIdOTII3YoHEiLubIY3k9ghZUFAqt6Nw51qE/xhpr8ufp0/XeXDiM/Erx1Qvv8M MDnP8YRiSjRtaY+74f1EukVnC8V09j5mubBjM= Received: by 10.90.54.5 with SMTP id c5mr4137622aga.73.1249915691272; Mon, 10 Aug 2009 07:48:11 -0700 (PDT) Received: from ?192.168.1.102? (ool-44c639d9.dyn.optonline.net [68.198.57.217]) by mx.google.com with ESMTPS id 40sm7755040aga.78.2009.08.10.07.48.10 (version=SSLv3 cipher=RC4-MD5); Mon, 10 Aug 2009 07:48:10 -0700 (PDT) Message-ID: <4A80332C.8060606@gmail.com> Date: Mon, 10 Aug 2009 10:48:12 -0400 From: Mark Miller User-Agent: Thunderbird 2.0.0.22 (X11/20090608) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: indexing_slowdown_with_latest_lucene_udpate References: <4A80304E.9020408@gmail.com> <8f0ad1f30908100743q581a161uda01fa4d7fcf4dab@mail.gmail.com> In-Reply-To: <8f0ad1f30908100743q581a161uda01fa4d7fcf4dab@mail.gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Robert Muir wrote: > This is real and not just for very short docs. Yes, you still pay the cost for longer docs, but it just becomes less important the longer the docs, as it plays a smaller role. Load a ton of one term docs, and it might be 50-60% slower - add a bunch of articles, and it might be closer to 20%-15% (I don't know the numbers, but the longer I made the docs, the less % slowdown, obviously). Still a good hit, but a short doc test magnafies the problem. It affects things no matter what, but when you don't do much tokenizing, normalizing, the cost of the reflection/tokenstream init dominates. - Mark --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org