Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 237C517744 for ; Thu, 4 Jun 2015 13:53:41 +0000 (UTC) Received: (qmail 50568 invoked by uid 500); 4 Jun 2015 13:53:40 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 50525 invoked by uid 500); 4 Jun 2015 13:53:40 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 50513 invoked by uid 99); 4 Jun 2015 13:53:40 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Jun 2015 13:53:40 +0000 Date: Thu, 4 Jun 2015 13:53:40 +0000 (UTC) From: "Bhupendra Kumar Jain (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13702) ImportTsv: Add dry-run functionality and log bad rows MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572786#comment-14572786 ] Bhupendra Kumar Jain commented on HBASE-13702: ---------------------------------------------- Yes you are right. For bulk mode its going to run all. > ImportTsv: Add dry-run functionality and log bad rows > ----------------------------------------------------- > > Key: HBASE-13702 > URL: https://issues.apache.org/jira/browse/HBASE-13702 > Project: HBase > Issue Type: New Feature > Reporter: Apekshit Sharma > Assignee: Apekshit Sharma > Attachments: HBASE-13702.patch > > > ImportTSV job skips bad records by default (keeps a count though). -Dimporttsv.skip.bad.lines=false can be used to fail if a bad row is encountered. > To be easily able to determine which rows are corrupted in an input, rather than failing on one row at a time seems like a good feature to have. > Moreover, there should be 'dry-run' functionality in such kinds of tools, which can essentially does a quick run of tool without making any changes but reporting any errors/warnings and success/failure. > To identify corrupted rows, simply logging them should be enough. In worst case, all rows will be logged and size of logs will be same as input size, which seems fine. However, user might have to do some work figuring out where the logs. Is there some link we can show to the user when the tool starts which can help them with that? > For the dry run, we can simply use if-else to skip over writing out KVs, and any other mutations, if present. -- This message was sent by Atlassian JIRA (v6.3.4#6332)