hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apekshit Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
Date Sat, 16 May 2015 01:11:59 GMT
Apekshit Sharma created HBASE-13702:

             Summary: ImportTsv: Add dry-run functionality and log bad rows
                 Key: HBASE-13702
                 URL: https://issues.apache.org/jira/browse/HBASE-13702
             Project: HBase
          Issue Type: New Feature
            Reporter: Apekshit Sharma

ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false
can be used to fail if a bad row is encountered. 
To be easily able to determine which rows are corrupted in an input, rather than failing on
one row at a time seems like a good feature to have.
Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially
does a quick run of tool without making any changes but reporting any errors/warnings and

To identify corrupted rows, simply logging them should be enough. In worst case, all rows
will be logged and size of logs will be same as input size, which seems fine. However, user
might have to do some work figuring out where the logs. If there some link we can show the
user in the starting which can help them with that?

For the dry run, we can simply use if-else to skip over creating table, writing out KVs, etc.

This message was sent by Atlassian JIRA

View raw message