hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows
Date Thu, 25 Jun 2015 20:28:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601878#comment-14601878
] 

Hadoop QA commented on HBASE-13702:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12741902/HBASE-13702-v5.patch
  against master branch at commit edef3d64bce41fffbc5649ffa19b2cf80ce28d7a.
  ATTACHMENT ID: 12741902

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 7 new or modified
tests.

    {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions
(2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the total number of
protoc compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the total number
of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.util.TestHBaseFsck
                  org.apache.hadoop.hbase.TestRegionRebalancing

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//testReport/
Release Findbugs (version 2.0.3) 	warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//artifact/patchprocess/checkstyle-aggregate.html

  Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14574//console

This message is automatically generated.

> ImportTsv: Add dry-run functionality and log bad rows
> -----------------------------------------------------
>
>                 Key: HBASE-13702
>                 URL: https://issues.apache.org/jira/browse/HBASE-13702
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Apekshit Sharma
>            Assignee: Apekshit Sharma
>             Fix For: 2.0.0, 1.3.0
>
>         Attachments: HBASE-13702-v2.patch, HBASE-13702-v3.patch, HBASE-13702-v4.patch,
HBASE-13702-v5.patch, HBASE-13702.patch
>
>
> ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false
can be used to fail if a bad row is encountered. 
> To be easily able to determine which rows are corrupted in an input, rather than failing
on one row at a time seems like a good feature to have.
> Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially
does a quick run of tool without making any changes but reporting any errors/warnings and
success/failure.
> To identify corrupted rows, simply logging them should be enough. In worst case, all
rows will be logged and size of logs will be same as input size, which seems fine. However,
user might have to do some work figuring out where the logs. Is there some link we can show
to the user when the tool starts which can help them with that?
> For the dry run, we can simply use if-else to skip over writing out KVs, and any other
mutations, if present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message