Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 79596 invoked from network); 12 Oct 2005 14:10:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 12 Oct 2005 14:10:26 -0000 Received: (qmail 79939 invoked by uid 500); 12 Oct 2005 14:10:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79842 invoked by uid 500); 12 Oct 2005 14:10:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 79810 invoked by uid 99); 12 Oct 2005 14:10:12 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2005 07:10:12 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of yseeley@gmail.com designates 72.14.204.197 as permitted sender) Received: from [72.14.204.197] (HELO qproxy.gmail.com) (72.14.204.197) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2005 07:10:14 -0700 Received: by qproxy.gmail.com with SMTP id e12so206071qbe for ; Wed, 12 Oct 2005 07:09:50 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=o0DWBqG5qoBJtc2NJ2eLAYuYHDz9uwq1oTwkpw9pc3X7bufx+vI+LlrMFpfzFMOXWQ9+eXPN6/Wgx2wNBtTqPju9YRwsMS+kBpbVQ+iaA+m9Km4+Z6WZ0AGLsqRzthtLdKXojJEr6wSWwDSoNLDpthU3Ckurmf2faEgVUWfPc6Q= Received: by 10.65.150.20 with SMTP id c20mr396176qbo; Wed, 12 Oct 2005 07:08:19 -0700 (PDT) Received: by 10.65.121.10 with HTTP; Wed, 12 Oct 2005 07:08:19 -0700 (PDT) Message-ID: Date: Wed, 12 Oct 2005 10:08:19 -0400 From: Yonik Seeley To: java-user@lucene.apache.org Subject: Re: "docMap" array in SegmentMergeInfo In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_18666_8015088.1129126099498" References: <20050713184257.55124.qmail@web41208.mail.yahoo.com> <42D5881C.6040304@apache.org> <88c6a6720510111623m3e09dbb2y4605c6c4a7485bf3@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_18666_8015088.1129126099498 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Thanks for the trace Peter, and great catch! It certainly does look like avoiding the construction of the docMap for a MultiTermEnum will be a significant optimization. -Yonik Now hiring -- http://tinyurl.com/7m67g On 10/12/05, Peter Keegan wrote: > > Here is one stack trace: > > Full thread dump Java HotSpot(TM) Client VM (1.5.0_03-b07 mixed mode): > > "Thread-6" prio=3D5 tid=3D0x6cf7a7f0 nid=3D0x59e50 waiting for monitor en= try > [0x6d2cf000..0x6d2cfd6c] > at org.apache.lucene.index.SegmentReader.isDeleted(SegmentReader.java:241= ) > - waiting to lock <0x04e40278> (a org.apache.lucene.index.SegmentReader) > at org.apache.lucene.index.SegmentMergeInfo.(SegmentMergeInfo.java > :43) > at org.apache.lucene.index.MultiTermEnum.(MultiReader.java:277) > at org.apache.lucene.index.MultiReader.terms(MultiReader.java:186) > at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:75) > at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:243) > at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:166) > at org.apache.lucene.search.Query.weight(Query.java:84) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:158) > at org.apache.lucene.search.Searcher.search(Searcher.java:67) > at org.apache.lucene.search.QueryFilter.bits(QueryFilter.java:62) > at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:121) > at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) > at org.apache.lucene.search.Hits.(Hits.java:51) > at org.apache.lucene.search.Searcher.search(Searcher.java:49) > > I've also seen it happen during sorting from: > > FieldSortedHitQueue.comparatorAuto -> > FieldCacheImpl.getAuto() -> > MultiReader.terms() -> > MultiTermEnum.init() -> > SegmentMergerInfo.init() -> > SegmentReader.isDeleted() > > Peter > > On 10/11/05, Yonik Seeley wrote: > > > > > We've been using this in production for a while and it fixed the > > > extremely slow searches when there are deleted documents. > > > > Who was the caller of isDeleted()? There may be an opportunity for an > easy > > optimization to grab the BitVector and reuse it instead of repeatedly > > calling isDeleted() on the IndexReader. > > > > -Yonik > > Now hiring -- http://tinyurl.com/7m67g > > > ------=_Part_18666_8015088.1129126099498--