hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Witten (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-10216) Change HBase to support local compactions
Date Fri, 20 Dec 2013 15:48:16 GMT
David Witten created HBASE-10216:

             Summary: Change HBase to support local compactions
                 Key: HBASE-10216
                 URL: https://issues.apache.org/jira/browse/HBASE-10216
             Project: HBase
          Issue Type: New Feature
          Components: Compaction
         Environment: All
            Reporter: David Witten

As I understand it compactions will read data from DFS and write to DFS.  This means that
even when the reading occurs on the local host (because region server has a local copy) all
the writing must go over the network to the other replicas.  This proposal suggests that HBase
would perform much better if all the reading and writing occurred locally and did not go over
the network. 

I propose that the DFS interface be extended to provide method that would merge files so that
the merging and deleting can be performed on local data nodes with no file contents moving
over the network.  The method would take a list of paths to be merged and deleted and the
merged file path and an indication of a file-format-aware class that would be run on each
data node to perform the merge.  The merge method provided by this merging class would be
passed files open for reading for all the files to be merged and one file open for writing.
 The custom class provided merge method would read all the input files and append to the output
file using some standard API that would work across all DFS implementations.  The DFS would
ensure that the merge had happened properly on all replicas before returning to the caller.
 It could be that greater resiliency could be achieved by implementing the deletion as a separate
phase that is only done after enough of the replicas had completed the merge. 

HBase would be changed to use the new merge method for compactions, and would provide an implementation
of the merging class that works with HFiles.

This proposal would require a custom code that understands the file format to be runnable
by the data nodes to manage the merge.  So there would need to be a facility to load classes
into DFS if there isn't such a facility already.  Or, less generally, HDFS could build in
support for HFile merging.

The merge method might be optional.  If the DFS implementation did not provide it a generic
version that performed the merge on top of the regular DFS interfaces would be used.

It may be that this method needs to be tweaked or ignored when the region server does not
have a local copy data so that, as happens currently, one copy of the data moves to the region

This message was sent by Atlassian JIRA

View raw message