hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10615) Make LoadIncrementalHFiles skip reference files
Date Thu, 27 Feb 2014 01:50:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913904#comment-13913904
] 

Jerry He commented on HBASE-10615:
----------------------------------

Hi, Matteo

Thanks for the comments!

There are two questions here.
1.  Should the bulk load throws an error or skip if it sees a reference file?  My argument
is that we should not throw an error.
     The existence of reference file is not an error condition.
2.  Is it safe to skip the reference file for the purpose of bulk loading from the user's
perspective?  Matteo raised the issue of possible loss of data.
     My argument is that we are fine for these reasons:
1)  The purpose of LoadIncrementalHFiles is to load the data contained in the hfiles of a
given region dir into HBase safely.  
   As long as this is satisfied, we are fine for the data for this scope 
2)  If we want to consider from a broader view, to confider the integrity of the entire table
data.  
  The user of the bulk load tool controls the bulk loading. 
  For example, the user will not copy out the links in a cloned table from a snapshot and
then expect to bulk load these links to have the data.
  In the reference example, the user will bulk load the parent region too. 

{quote} 
you upload the parent region data but not the daughter reference files
the CatalogJanitor kicks in and the parent is removed, since there are no references to the
parent
and your data is lost...
{quote}
Why would the data is lost?  I thought the hfiles in the parent region would be added or sliced
into an existing live region. The bulk load tool does not care if the input hfile region is
a split parent or not, right?  Maybe I miss and misunderstand something? 

   

> Make LoadIncrementalHFiles skip reference files
> -----------------------------------------------
>
>                 Key: HBASE-10615
>                 URL: https://issues.apache.org/jira/browse/HBASE-10615
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.96.0
>            Reporter: Jerry He
>            Assignee: Jerry He
>            Priority: Minor
>         Attachments: HBASE-10615-trunk.patch
>
>
> There is use base that the source of hfiles for LoadIncrementalHFiles can be a FileSystem
copy-out/backup of HBase table or archive hfiles.  For example,
> 1. Copy-out of hbase.rootdir, table dir, region dir (after disable) or archive dir.
> 2. ExportSnapshot
> It is possible that there are reference files in the family dir in these cases.
> We have such use cases, where trying to load back into HBase, we'll get
> {code}
> Caused by: org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem reading HFile
Trailer from file hdfs://HDFS-AMR/tmp/restoreTemp/117182adfe861c5d2b607da91d60aa8a/info/aed3d01648384b31b29e5bad4cd80bec.d179ab341fc68e7612fcd74eaf7cafbd
>         at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:570)
>         at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:594)
>         at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile.java:636)
>         at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.groupOrSplit(LoadIncrementalHFiles.java:472)
>         at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:393)
>         at org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$2.call(LoadIncrementalHFiles.java:391)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
>         at java.lang.Thread.run(Thread.java:738)
> Caused by: java.lang.IllegalArgumentException: Invalid HFile version: 16715777 (expected
to be between 2 and 2)
>         at org.apache.hadoop.hbase.io.hfile.HFile.checkFormatVersion(HFile.java:927)
>         at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:426)
>         at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:568)
> {code}
> It is desirable and safe to skip these reference files since they don't contain any real
data for bulk load purpose.
>   



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message