hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matteo Bertozzi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
Date Wed, 26 Feb 2014 22:20:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913610#comment-13913610

Matteo Bertozzi commented on HBASE-10615:

I'm not convinced about it... 

let say that you copy hbase.rootdir then what are you going to bulk load?
if you bulk load only /hbase/table and you skip references or links you loose data.

example for links is: clone_snapshot
all the table is based on links... if you try to bulk load it, you are skipping every file...

example for reference is: 
you upload the parent region data but not the daughter reference files
the CatalogJanitor kicks in and the parent is removed, since there are no references to the
and your data is lost...

can you describe more the use case, and give examples on how did you tested it?

> Make LoadIncrementalHFiles skip reference files
> -----------------------------------------------
>                 Key: HBASE-10615
>                 URL: https://issues.apache.org/jira/browse/HBASE-10615
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.96.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>         Attachments: HBASE-10615-trunk.patch
> There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem
copy-out/backup of HBase table or archive hfiles.  For example,
> 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir.
> 2. ExportSnapshot
> It is possible that there are reference files in the family dir in these cases.
> We have such use cases, where trying to load back into HBase, we'll get
> {code}
> Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile
Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd
>         at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570)
>         at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594)
>         at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636)
>         at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472)
>         at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393)
>         at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>         at java.lang.Thread.run(Thread.java:738)
> Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected
to be between 2 and 2)
>         at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927)
>         at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426)
>         at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568)
> {code}
> It is desirable and safe to skip these reference files since they don't contain any real
data for bulk load purpose.

This message was sent by Atlassian JIRA

View raw message