Return-Path: Delivered-To: apmail-hadoop-core-commits-archive@www.apache.org Received: (qmail 11922 invoked from network); 3 Oct 2008 19:59:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Oct 2008 19:59:14 -0000 Received: (qmail 31653 invoked by uid 500); 3 Oct 2008 19:59:12 -0000 Delivered-To: apmail-hadoop-core-commits-archive@hadoop.apache.org Received: (qmail 31626 invoked by uid 500); 3 Oct 2008 19:59:12 -0000 Mailing-List: contact core-commits-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-commits@hadoop.apache.org Received: (qmail 31617 invoked by uid 99); 3 Oct 2008 19:59:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Oct 2008 12:59:12 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Oct 2008 19:58:17 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id B4B9E11158 for ; Fri, 3 Oct 2008 19:58:22 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: core-commits@hadoop.apache.org Date: Fri, 03 Oct 2008 19:58:22 -0000 Message-ID: <20081003195822.7804.42882@eos.apache.org> Subject: [Hadoop Wiki] Trivial Update of "Hbase/NewFileFormat" by stack X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The following page has been changed by stack: http://wiki.apache.org/hadoop/Hbase/NewFileFormat ------------------------------------------------------------------------------ - This page is for discussion related to [https://issues.apache.org/jira/browse/HBASE-61 HBASE-61, Create an HBase-specific MapFile implementation]. That issue, and its linked issues, has a bunch of suggestions for how we might do a better persistence. Most have been replicated in the ''New Format'' section below. Other related issues include, [https://issues.apache.org/jira/browse/HADOOP-3315 TFile], and [https://issues.apache.org/jira/browse/HBASE-647 HBASE-647, Remove the HStoreFile 'info' file (and index and bloomfilter if possible)]. + This page is for discussion related to [https://issues.apache.org/jira/browse/HBASE-61 HBASE-61, Create an HBase-specific MapFile implementation]. That issue, and its linked issues, has a bunch of suggestions for how we might do a better persistence. Most have been replicated in the ''New Format'' section below. Other related issues include, [https://issues.apache.org/jira/browse/HADOOP-3315 TFile], and [https://issues.apache.org/jira/browse/HBASE-647 HBASE-647, Remove the HStoreFile 'info' file (and index and bloomfilter if possible)] as well as ''SSTable'' from the bigtable paper. == Current Implementation == @@ -41, +41 @@ * Always-on General bloomfilter. We know how many entries a file will have when we go to flush it so we can optimally size a bloomfilter. The small amount of memory a bloomfilter occupies will pay for itself many-fold in the seeks saved trying to figure is a file contains an asked for key. * Optimal random-access * Iterate over keys only, rather than mapfiles currenty key+values always. This'd be useful when trying to find closest. TFile and SequenceFile can do this (Its not exposed in MapFile). - + + === Index === + TODO, but the TFile block-based rather than MapFile interval-based would seem better for us; indices then are of predicatable size; a seek to the index position will load at an amenable spot when blocks are compressed. === Nice-to-haves === * Don't write out the family portion of column when writing keys. == Other File Formats == + Cassandra uses a Sequence File. It adds key/values in blocks of 128 by default. On the 128th entry, an index for the block keys is inlined and then a new block begins. Block offsets are kept out in an index file as in MapFile. Bloomfilters are on by default. + From the bigtable paper, an SSTable "... contains a sequence of blocks (typically each block is 64KB in size, but this is configurable). A block index (stored at the end of the SSTable) is used to locate blocks; the index is loaded into memory when the SSTable is opened. A lookup can be performed with a single disk seek: we first find the appropriate block by performing a binary search in the in-memory index, and then reading the appropriate block from disk. Optionally, an SSTable can be completely mapped into memory, which allows us to perform lookups and scans without touching the disk." +