Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7AA8CE5CA for ; Fri, 15 Mar 2013 12:30:16 +0000 (UTC) Received: (qmail 25237 invoked by uid 500); 15 Mar 2013 12:30:16 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 25103 invoked by uid 500); 15 Mar 2013 12:30:15 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 25083 invoked by uid 99); 15 Mar 2013 12:30:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Mar 2013 12:30:14 +0000 Date: Fri, 15 Mar 2013 12:30:14 +0000 (UTC) From: "Jean-Marc Spaggiari (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8109) HBase can manage blocks instead of (or inside) files in HDFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603329#comment-13603329 ] Jean-Marc Spaggiari commented on HBASE-8109: -------------------------------------------- Today HBCK is using a lot of files/dirs names checks to find lost regions and things like that. Will we be able to do almost the same thing with blocks? I mean, can we list blocks we are all HBase related but not referenced in the META? > HBase can manage blocks instead of (or inside) files in HDFS > ------------------------------------------------------------ > > Key: HBASE-8109 > URL: https://issues.apache.org/jira/browse/HBASE-8109 > Project: HBase > Issue Type: Brainstorming > Reporter: Sergey Shelukhin > > Prompted by previous non-Hadoop experience and some dev list discussions, and after talking to some HDFS people about blocks. > HBase could improve a lot by managing HDFS blocks instead of files, and reusing the blocks among other things. Some areas that could improve are splits, compactions, management of large blobs, locality enforcement. > I was told that block APIs in Hadoop 2 are well-isolated, but not exposed yet. They can easily be exposed, and as one of the first potential users we could get to help shape them. Two areas that from my limited understanding is currently fuzzy are namespaces for blocks, and ref-counting. > We should come up with list of initial scenarios to figure out what we need from block API (locality, detecting/enforcing block boundary/variable size blocks, reusing one block, ...). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira