hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8109) HBase can manage blocks instead of (or inside) files in HDFS
Date Wed, 20 Mar 2013 00:59:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607130#comment-13607130

Sergey Shelukhin commented on HBASE-8109:

Our scenarios are as such (off the top of my head):
1) Recombine files instead of rewriting where necessary (for example, in case of compactions
we can add unchanged blocks to the destination file directly, during region split we can split
file into two sets of blocks and make new files without rewriting, for large blob storage
that has blobs stored off the main table we can do cleanup of removed entries without rewriting
the files, etc.)
2) Hardlinks and copy-on-write (for example, snapshots or backup scenarios).
3) Locality. Determine where blocks are in order to put regions close to their blocks.

There may be others.
Refcounting blocks can indeed complicate distributed namespaces, but I think it's an essential
feature for block FS... If not supported natively, people will reimplement it due to how much
stuff can be done with reused blocks. I think it should be doable on large scale, either by
coordination, by some restrictions on block reuse with regard to namespace, or by separating
block management and optionally namespace from file namespace.
> HBase can manage blocks instead of (or inside) files in HDFS
> ------------------------------------------------------------
>                 Key: HBASE-8109
>                 URL: https://issues.apache.org/jira/browse/HBASE-8109
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Sergey Shelukhin
> Prompted by previous non-Hadoop experience and some dev list discussions, and after talking
to some HDFS people about blocks.
> HBase could improve a lot by managing HDFS blocks instead of files, and reusing the blocks
among other things. Some areas that could improve are splits, compactions, management of large
blobs, locality enforcement.
> I was told that block APIs in Hadoop 2 are well-isolated, but not exposed yet. They can
easily be exposed, and as one of the first potential users we could get to help shape them.
Two areas that from my limited understanding is currently fuzzy are namespaces for blocks,
and ref-counting.
> We should come up with list of initial scenarios to figure out what we need from block
API (locality, detecting/enforcing block boundary/variable size blocks, reusing one block,

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message