hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victor Xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13985) Add configuration to skip validating HFile format when bulk loading millions of HFiles
Date Mon, 29 Jun 2015 06:30:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605196#comment-14605196

Victor Xu commented on HBASE-13985:

Thanks, Ted. I'll add a new patch.
I bulkloaded nearly 2 million hfiles into one HTable last Saturday, and I waited for more
than 30 mins and it still block by validation of HFile format. So I added this configuration
to skip this logic. Finally, the whole bulkload process completed in 15 mins.
A small test shows that HFile format validation speed could be 350/sec in single thread, so
checking 3.5 million hfiles needs several hours. Even though multi-threads could speed up
this process, I prefer to add a configuration to skip the whole logic completely.

> Add configuration to skip validating HFile format when bulk loading millions of HFiles
> --------------------------------------------------------------------------------------
>                 Key: HBASE-13985
>                 URL: https://issues.apache.org/jira/browse/HBASE-13985
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.98.13
>            Reporter: Victor Xu
>            Assignee: Victor Xu
>            Priority: Minor
>              Labels: regionserver
>             Fix For: 0.98.14
>         Attachments: HBASE-13985.patch
> When bulk loading millions of HFile into one HTable, checking HFile format is the most
time-consuming phase. Maybe we could use a parallel mechanism to increase the speed, but when
it comes to millions of HFiles, it may still cost dozens of minutes. So I think it's necessary
to add an option for advanced user to bulkload without checking HFile format at all. 
> Of course, the default value of this option should be true.

This message was sent by Atlassian JIRA

View raw message