hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8109) HBase can manage blocks instead of (or inside) files in HDFS
Date Fri, 15 Mar 2013 20:32:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603814#comment-13603814
] 

Andrew Purtell commented on HBASE-8109:
---------------------------------------

bq. Today HBCK is using a lot of files/dirs names checks to find lost regions and things like
that. Will we be able to do almost the same thing with blocks?

I think the short answer is that HBCK might come to look a lot like a fsck for a filesystem.

However going below the filesystem to the block pool directly presents difficulties. Not sure
how HBase would get block reports from DataNodes to know where blocks are located, would guess
this amounts to "embedding NN inside master" as Enis mentioned.

If we had a couple of additional APIs to manipulate the block lists of files, then we can
do block based optimizations and still retain the namespace view where it is useful, like
for HBCK.
                
> HBase can manage blocks instead of (or inside) files in HDFS
> ------------------------------------------------------------
>
>                 Key: HBASE-8109
>                 URL: https://issues.apache.org/jira/browse/HBASE-8109
>             Project: HBase
>          Issue Type: Brainstorming
>            Reporter: Sergey Shelukhin
>
> Prompted by previous non-Hadoop experience and some dev list discussions, and after talking
to some HDFS people about blocks.
> HBase could improve a lot by managing HDFS blocks instead of files, and reusing the blocks
among other things. Some areas that could improve are splits, compactions, management of large
blobs, locality enforcement.
> I was told that block APIs in Hadoop 2 are well-isolated, but not exposed yet. They can
easily be exposed, and as one of the first potential users we could get to help shape them.
Two areas that from my limited understanding is currently fuzzy are namespaces for blocks,
and ref-counting.
> We should come up with list of initial scenarios to figure out what we need from block
API (locality, detecting/enforcing block boundary/variable size blocks, reusing one block,
...).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message