hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8109) HBase can manage blocks instead of (or inside) files in HDFS
Date Thu, 14 Mar 2013 20:58:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13602720#comment-13602720

Enis Soztutar commented on HBASE-8109:

We do no want to build a namespace on top of blocks, and my suggestion of embedding the NN
inside master, and doing federation did not get good marks from the dfs folks here.

Thinking about it, we actually do not have to have our data files, being actual files, that
can be read from outside. There are very limited places I can think of, where we rely on a
namespace semantics (correct me if I'm wrong)

I can definitely see that being block aware have advantages, in terms of compaction, splits,
hard links (for backup, snapshots, splits, copy-on-write tables, etc), and caching (hbase
block cache and/or fs block cache). 

> HBase can manage blocks instead of (or inside) files in HDFS
> ------------------------------------------------------------
>                 Key: HBASE-8109
>                 URL: https://issues.apache.org/jira/browse/HBASE-8109
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Sergey Shelukhin
> Prompted by previous non-Hadoop experience and some dev list discussions, and after talking
to some HDFS people about blocks.
> HBase could improve a lot by managing HDFS blocks instead of files, and reusing the blocks
among other things. Some areas that could improve are splits, compactions, management of large
blobs, locality enforcement.
> I was told that block APIs in Hadoop 2 are well-isolated, but not exposed yet. They can
easily be exposed, and as one of the first potential users we could get to help shape them.
Two areas that from my limited understanding is currently fuzzy are namespaces for blocks,
and ref-counting.
> We should come up with list of initial scenarios to figure out what we need from block
API (locality, detecting/enforcing block boundary/variable size blocks, reusing one block,

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message