Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 78066 invoked from network); 26 Sep 2006 16:31:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 Sep 2006 16:31:25 -0000 Received: (qmail 52510 invoked by uid 500); 26 Sep 2006 16:31:24 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 52489 invoked by uid 500); 26 Sep 2006 16:31:24 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 52479 invoked by uid 99); 26 Sep 2006 16:31:24 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Sep 2006 09:31:24 -0700 Authentication-Results: idunn.apache.osuosl.org smtp.mail=cutting@apache.org; spf=permerror X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests= Received-SPF: error (idunn.apache.osuosl.org: domain apache.org from 204.127.192.82 cause and error) Received: from [204.127.192.82] ([204.127.192.82:57399] helo=rwcrmhc12.comcast.net) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id B4/21-13661-BD559154 for ; Tue, 26 Sep 2006 09:31:23 -0700 Received: from [192.168.168.15] (c-71-202-24-246.hsd1.ca.comcast.net[71.202.24.246]) by comcast.net (rwcrmhc12) with ESMTP id <20060926163120m1200org3pe>; Tue, 26 Sep 2006 16:31:20 +0000 Message-ID: <451955D7.5010104@apache.org> Date: Tue, 26 Sep 2006 09:31:19 -0700 From: Doug Cutting User-Agent: Thunderbird 1.5.0.5 (X11/20060728) MIME-Version: 1.0 To: hadoop-dev@lucene.apache.org Subject: Re: Forcing all blocks to be present "locally" References: <451844AA.8040304@getopt.org> <1bf79d3e0609251417i7417b00dx56e1dc0283a2079e@mail.gmail.com> <45184AF4.6070002@getopt.org> <84265231-CDC0-4B60-9F24-F6394F59F421@yahoo-inc.com> <451871ED.9050007@getopt.org> <45187C81.8090200@yahoo-inc.com> <4518EF52.3080509@getopt.org> In-Reply-To: <4518EF52.3080509@getopt.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Andrzej Bialecki wrote: > What I > need is a way to _temporarily_ make them localized to a particular > machine, just for performance reasons, and without having to copy them > out of DFS ... Your current solution is to copy them to the local FS. This is unacceptable because it is (a) slow and (b) uses too much space. Re (a): I don't see how the solution you suggest could be any faster. If anything it could be slower, since, in addition to copying a file's blocks locally, one would also need to make room for these blocks by re-locating blocks that were local to other nodes. Ideally these would be other index file blocks, but it might not work out that well. Re (b): Since you'll be copying indexes out to local storage, could you reduce their replication count from three to two? That would free up some space on each node (about the right amount, in fact). If the DFS copy becomes incomplete, then you'd have to manually either re-create the index or copy a local version, which is not quite as convenient as having DFS handle your disk failures, but, with a replication of two this should still be rare. Disks are awfully big these days. I'm surprised your disks are so full that an index that is small-enough to be searched quickly by a single node takes up a significant amount of the disk. Or are you copying the entire segment locally? I would hope that DFS would be fast enough for summary and cache requests. With a cluster of 10 nodes and ten hits displayed per page, each node should only need to handle one summary request per query. Cache requests are much rarer yet. Doug