Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 48111 invoked from network); 18 Aug 2008 21:24:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Aug 2008 21:24:16 -0000 Received: (qmail 91109 invoked by uid 500); 18 Aug 2008 21:24:08 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 91079 invoked by uid 500); 18 Aug 2008 21:24:08 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 91068 invoked by uid 99); 18 Aug 2008 21:24:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Aug 2008 14:24:08 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.233.184.238] (HELO wr-out-0506.google.com) (64.233.184.238) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Aug 2008 21:23:11 +0000 Received: by wr-out-0506.google.com with SMTP id c30so2543431wra.21 for ; Mon, 18 Aug 2008 14:23:39 -0700 (PDT) Received: by 10.90.103.3 with SMTP id a3mr8456004agc.5.1219094619075; Mon, 18 Aug 2008 14:23:39 -0700 (PDT) Received: from ?10.17.4.4? ( [72.93.214.93]) by mx.google.com with ESMTPS id 8sm7137498hsp.4.2008.08.18.14.23.37 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 18 Aug 2008 14:23:38 -0700 (PDT) Message-Id: <09903B91-D374-4EC8-A192-C495CFC8CC5E@mikemccandless.com> From: Michael McCandless To: java-user@lucene.apache.org In-Reply-To: <19040147.post@talk.nabble.com> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v926) Subject: Re: Appropriate disk optimization for large index? Date: Mon, 18 Aug 2008 17:23:36 -0400 References: <19009580.post@talk.nabble.com> <420493.45261.qm@web50304.mail.re2.yahoo.com> <19038372.post@talk.nabble.com> <0EDBC83F-FE47-45B3-86F6-28D147644456@mikemccandless.com> <19040147.post@talk.nabble.com> X-Mailer: Apple Mail (2.926) X-Virus-Checked: Checked by ClamAV on apache.org mattspitz wrote: > Are the index files synced on writer.close()? No, they aren't. Not until 2.4 (trunk). > Thank you so much for your help. I think the seek time is the issue, > especially considering the high merge factor and the fact that the > segments > are scattered all over the disk. You're welcome! I agree: optimizing seek time seems likely to be the biggest win. > Will a faster disk cache access affect the optimization and > merging? I > don't really have a sense for what of the segments are kept in > memory during > a merge. It doesn't make sense to me that Lucene would pull all of > the > segments into memory to merge them, but I don't really know how. Segments aren't kept in memory during merging... it's more like a cursor that sweeps through each of the files for the 50 segments being merged. Lucene does buffer its reads, so we read a chunk into RAM and then pull bits off that chunk. And the OS does readahead. But otherwise it's all on disk and we make a single sweep through each of the segments to be merged. So I wouldn't expect the disk cache's performance to impact Lucene, during merging or flushing. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org