hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Witten (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10216) Change HBase to support local compactions
Date Sat, 14 Feb 2015 18:54:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321639#comment-14321639

David Witten commented on HBASE-10216:

Replying to Andrew.  Thanks for your comments.  I've included much of your message and will
reply in-line.

We could propose a new HDFS API that "would merge files so that the merging and deleting can
be performed on local data nodes with no file contents moving over the network", but does
this not only push something implemented today in the HBase regionserver down into the HDFS
[Witten] It is indeed a re-implementation of an existing facility.  The proposal involves
a callback called on each replica to do the actual merge.  The implementation of the callback
would be implemented by the HBase team.  So the policies about how merging is done and what
the file format is will still be made by the HBase team.  Which files are to be merged is
specified by the HDFS client and would also be decided within HBase.

Could a merge as described be safely executed in parallel on multiple datanodes without coordination?
No, because the result is not a 1:1 map of input block to output block. 
[Witten] As I think about it each replica can merge the input files to the output files without
much coordination.  When the merge is finished it needs to notify the merge coordinator about
the new blocks and files.  The name node needs to be notified about the new files and their
blocks from the merge coordinator.  I agree that there isn't a design for this yet.  But the
data involved in the coordination will involve block ids and file ids and not block content.
 So the amount of data moving across the net is substantially reduced.  Regarding the 1:1
comment.  It would be required that the merging callback produce identical output files for
a given set of input files.  This means that all replicas will produce exactly the same output.

Therefore in a realistic implementation (IMHO) a single datanode would handle the merge procedure.

[Witten] Yes,
>From a block device and network perspective nothing would change.
[Witten] The amount of data moving over the net would be massively reduced.

Set the above aside. We can't push something as critical to HBase as compaction down into
[Witten] HDFS would be providing a generic facility to merge files.  HDFS would not provide
any policy file formats or merger behavior.  As such the important code will be retained by
HBase.  If there is a significant performance improvement, without loss of reliability, HBase
can't not seriously consider it.

First, the HDFS project is unlikely to accept the idea or implement it in the first place.
Even in the unlikely event that happens,
[Witten] If there is a significant improvement to HBase, and other HDFS clients which do merging
(Does Parquet or other higher level storage clients?).  I would think they'd be eager.

we would need reimplement compaction using the new HDFS facility to take advantage of it,
yet we will need to support older versions of HDFS without the new API for a while,
[Witten]  Actually, we'd need to keep it around forever.  There would be a DFS interface that
HDFS would implement, but other implementations of DFS may not.  So HBase would need to keep
its current implementation if the new APIs were not implemented by some DFS.  It may also
be that we'd want to keep the existing implementation when the current environment cares less
about network IO.

and if the new HDFS API ever doesn't perfectly address the minutiae of HBase compaction then
or going forward we would be back where we started.
[Witten] I think the objection is handled by the callback nature of the new API.

Only then are we really saving on IO.
[Witten] The original proposal saves network IO not only on the initial compactions but also
on all the other compactions.

[Witten] The performance value of the original proposal is not certain (as can be seen by
repeated use of IF in my comments here).  I think it hinges on the question I asked in my
last comment post: Is it better to have local reads and writes and reduce network overhead,
or is it better to limit disk reads by having them only occur on one replica (as the current
implementation does) and thereby reduce disk read overhead.  There may not be a simple answer
to this question.  It may be that the original proposal is worse on a LAN but when replicas
are geographically far away reducing network IO is worth the cost of extra local disk reads.
 I'm still interested in comments about this question.

> Change HBase to support local compactions
> -----------------------------------------
>                 Key: HBASE-10216
>                 URL: https://issues.apache.org/jira/browse/HBASE-10216
>             Project: HBase
>          Issue Type: New Feature
>          Components: Compaction
>         Environment: All
>            Reporter: David Witten
> As I understand it compactions will read data from DFS and write to DFS.  This means
that even when the reading occurs on the local host (because region server has a local copy)
all the writing must go over the network to the other replicas.  This proposal suggests that
HBase would perform much better if all the reading and writing occurred locally and did not
go over the network. 
> I propose that the DFS interface be extended to provide method that would merge files
so that the merging and deleting can be performed on local data nodes with no file contents
moving over the network.  The method would take a list of paths to be merged and deleted and
the merged file path and an indication of a file-format-aware class that would be run on each
data node to perform the merge.  The merge method provided by this merging class would be
passed files open for reading for all the files to be merged and one file open for writing.
 The custom class provided merge method would read all the input files and append to the output
file using some standard API that would work across all DFS implementations.  The DFS would
ensure that the merge had happened properly on all replicas before returning to the caller.
 It could be that greater resiliency could be achieved by implementing the deletion as a separate
phase that is only done after enough of the replicas had completed the merge. 
> HBase would be changed to use the new merge method for compactions, and would provide
an implementation of the merging class that works with HFiles.
> This proposal would require a custom code that understands the file format to be runnable
by the data nodes to manage the merge.  So there would need to be a facility to load classes
into DFS if there isn't such a facility already.  Or, less generally, HDFS could build in
support for HFile merging.
> The merge method might be optional.  If the DFS implementation did not provide it a generic
version that performed the merge on top of the regular DFS interfaces would be used.
> It may be that this method needs to be tweaked or ignored when the region server does
not have a local copy data so that, as happens currently, one copy of the data moves to the
region server.

This message was sent by Atlassian JIRA

View raw message