Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 23536 invoked from network); 29 Oct 2010 17:00:13 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Oct 2010 17:00:13 -0000 Received: (qmail 35889 invoked by uid 500); 29 Oct 2010 17:00:12 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 35863 invoked by uid 500); 29 Oct 2010 17:00:12 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 35855 invoked by uid 99); 29 Oct 2010 17:00:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Oct 2010 17:00:12 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-wy0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Oct 2010 17:00:06 +0000 Received: by wyf23 with SMTP id 23so3500101wyf.14 for ; Fri, 29 Oct 2010 09:59:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=AVjfwWlJQnlkojehFR84TK9ujyxi+BeT0pg0kHTfYg4=; b=o3rXqWNU/Ri+3rQiLSSYpBiAKiiNpYBczwfWQr6mH3ix+kYk96atUK5L7/DHBnsrc0 iMwDBSIegsSP6OoaQA42UX8aVArl2yudX9soxmdNQ5OmknjdWZsXqHSfLyL8Urx+SKXt LAFfc0diIEt0V7fz3Y563ZA9ZrXKnw40YcJZ4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=YRNOI5lsU/78H4VpvLImWqOeXBUmiQC4xgEURCqajLRaZcn3VuaQBrAjedQyqTVbRd 67+0xBc3mTopZgwIJW0wvoP96k4wlxtJDDPavVAPwNbnEOJVgtiWb2lEDvr55oUFvSKW aVn42/XiOfpiwiyRaG5J68DQNbVkuIRUArZxM= MIME-Version: 1.0 Received: by 10.216.13.17 with SMTP id a17mr1885896wea.46.1288371585297; Fri, 29 Oct 2010 09:59:45 -0700 (PDT) Sender: saint.ack@gmail.com Received: by 10.216.93.74 with HTTP; Fri, 29 Oct 2010 09:59:45 -0700 (PDT) In-Reply-To: References: Date: Fri, 29 Oct 2010 09:59:45 -0700 X-Google-Sender-Auth: ivrBnjBDMyUtSF2JZbXhoS9oA0s Message-ID: Subject: Re: HBase random access in HDFS and block indices From: Stack To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Mon, Oct 18, 2010 at 9:30 PM, Matt Corgan wrote: > I was envisioning the HFiles being opened and closed more often, but it > sounds like they're held open for long periods and that the indexes are > permanently cached. =A0Is it roughly correct to say that after opening an > HFile and loading its checksum/metadata/index/etc then each random data > block access only requires a single pread, where the pread has some > threading and connection overhead, but theoretically only requires one di= sk > seek. =A0I'm curious because I'm trying to do a lot of random reads, and = given > enough application parallelism, the disk seeks should become the bottlene= ck > much sooner than the network and threading overhead. > You have it basically right. On region deploy, all files that comprise a region are opened and thereafter held opened. Part of opening is reading in index and file metadata so opened files occupy some memory. An optimization would be to let go of unused files reopening on access. St.Ack