hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6339) Bulkload call to RS should begin holding write lock only after the file has been transferred
Date Sat, 07 Jul 2012 16:09:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408699#comment-13408699

Harsh J commented on HBASE-6339:

Thanks for the comments Ted.

Region splitting being disabled isn't a simple toggle value, so its kinda tricky to determine
if it is indeed disabled. Besides that, there's still a chance of a manual split operation.

Granted we can dupe the checks, once before the file pull (lock before this but then release),
and once again right after (lock here and return only at end, as normal), I think that adds
unnecessary complications. For the moment, if Ops had HBASE-6350, I think it should be satisfactory
enough. It isn't often that I notice separated FS clusters loading between them.

Thoughts? Is it worth the extra check and complexity addition?
> Bulkload call to RS should begin holding write lock only after the file has been transferred
> --------------------------------------------------------------------------------------------
>                 Key: HBASE-6339
>                 URL: https://issues.apache.org/jira/browse/HBASE-6339
>             Project: HBase
>          Issue Type: Improvement
>          Components: client, regionserver
>    Affects Versions: 0.90.0
>            Reporter: Harsh J
>            Assignee: Harsh J
> I noticed that right now, under a bulkLoadHFiles call to an RS, we grab the HRegion write
lock as soon as we determine that it is a multi-family bulk load we'll be attempting. The
file copy from the caller's source FS is done after holding the lock.
> This doesn't seem right. For instance, we had a recent use-case where the bulk load running
cluster is a separate HDFS instance/cluster than the one that runs HBase and the transfers
between these FSes can get slower than an intra-cluster transfer. Hence I think we should
begin to hold the write lock only after we've got a successful destinationFS copy of the requested
file, and thereby allow more write throughput to pass.
> Does this sound reasonable to do?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message