hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8109) HBase can manage blocks instead of (or inside) files in HDFS
Date Sun, 17 Mar 2013 18:49:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604712#comment-13604712
] 

Konstantin Shvachko commented on HBASE-8109:
--------------------------------------------

Exposing block API from HDFS is an interesting idea. If done carefully. You probably still
want HDFS to do block management and maintain the block namespace, otherwise you end up rewriting
NameNode (and talking heresy).
As Ted mentioned in [Giraffa (link to the project)|http://code.google.com/a/apache-extras.org/p/giraffa/wiki/Introduction]
we indeed encountered similar problem. Giraffa stores files' metadata in a single HBase table,
while the blocks reside in HDFS DataNodes. We use NameNode as a block manager in current implementation.
Since blocks are not exposed from HDFS we used a workaround, namely we create a file, then
allocate single block in it, then rename that file to the block id. The file then is used
as the Giraffa block. Adding this operation to HDFS to do it atomically would help Giraffa
a lot.
Sergey, I was wondering if you had similar scenarios for using block APIs in HBase? What are
they?
You mention counting block references. I assume this means that the same block can belong
to multiple files. While it seems natural and simple in single-NameNode design, in the world
of distributed namespace this turns into a distributed operation and a potential obstacle
for scalability.
                
> HBase can manage blocks instead of (or inside) files in HDFS
> ------------------------------------------------------------
>
>                 Key: HBASE-8109
>                 URL: https://issues.apache.org/jira/browse/HBASE-8109
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Sergey Shelukhin
>
> Prompted by previous non-Hadoop experience and some dev list discussions, and after talking
to some HDFS people about blocks.
> HBase could improve a lot by managing HDFS blocks instead of files, and reusing the blocks
among other things. Some areas that could improve are splits, compactions, management of large
blobs, locality enforcement.
> I was told that block APIs in Hadoop 2 are well-isolated, but not exposed yet. They can
easily be exposed, and as one of the first potential users we could get to help shape them.
Two areas that from my limited understanding is currently fuzzy are namespaces for blocks,
and ref-counting.
> We should come up with list of initial scenarios to figure out what we need from block
API (locality, detecting/enforcing block boundary/variable size blocks, reusing one block,
...).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message