hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-15808) Reduce potential bulk load intermediate space usage and waste
Date Mon, 09 May 2016 23:00:17 GMT
Jerry He created HBASE-15808:
--------------------------------

             Summary: Reduce potential bulk load intermediate space usage and waste
                 Key: HBASE-15808
                 URL: https://issues.apache.org/jira/browse/HBASE-15808
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 1.2.0
            Reporter: Jerry He
            Assignee: Jerry He
            Priority: Minor


If the bulk load input files do not match the existing region boudaries, the files will be
splitted.
In the unfornate cases where the files need to be splitted multiple times,
the process can consume unnecessary space and can even cause out of space.

Here is over-simplified example.

Orinal size of input files:  
  consumed space: size --> 300GB
After a round of splits: 
  consumed space: size + tmpspace1 --> 300GB + 300GB
After another round of splits: 
  consumded space:  size + tmpspace1 + tmpspace2 --> 300GB + 300GB + 300GB

..

Currently we don't do any cleanup in the process. At least all the intermediate tmpspace (not
the last one) can be deleted in the process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message