hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13034) Importing rows with bulkupload can overload single regionservers
Date Sat, 14 Feb 2015 01:09:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321091#comment-14321091

Nick Dimiduk commented on HBASE-13034:

I take it you are using the Import tool. Are you using it in normal "online" mode, or are
you using it to generate HFiles? Your title says the latter, but your description seems the
former. It sounds like the destination table should be split before the large import job begins.
Can you give any more details?

> Importing rows with bulkupload can overload single regionservers
> ----------------------------------------------------------------
>                 Key: HBASE-13034
>                 URL: https://issues.apache.org/jira/browse/HBASE-13034
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.98.0
>            Reporter: Bryant Khau
>            Priority: Minor
> Exporting a table with a common schema, like hashes as the key, will result in a sorted
exported file. When imported with org.apache.hadoop.hbase.mapreduce.Import, region servers
can be overloaded one by one by requests by the MapReduce job, since the rows are imported
in sequential order, and a regions span ranges in sequential order. This is more likely to
happen with lots of data and not a lot of regions. 

This message was sent by Atlassian JIRA

View raw message