Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 323A77236 for ; Wed, 21 Dec 2011 19:57:57 +0000 (UTC) Received: (qmail 4813 invoked by uid 500); 21 Dec 2011 19:57:56 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 4767 invoked by uid 500); 21 Dec 2011 19:57:56 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 4759 invoked by uid 500); 21 Dec 2011 19:57:56 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 4754 invoked by uid 99); 21 Dec 2011 19:57:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 19:57:56 +0000 X-ASF-Spam-Status: No, hits=-2002.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Dec 2011 19:57:53 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D5327122B95 for ; Wed, 21 Dec 2011 19:57:31 +0000 (UTC) Date: Wed, 21 Dec 2011 19:57:31 +0000 (UTC) From: "Dmytro Molkov (Created) (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: <355303045.36748.1324497451874.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HIVE-2671) GenericUDTFJSONTuple ignores IOExceptions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org GenericUDTFJSONTuple ignores IOExceptions ----------------------------------------- Key: HIVE-2671 URL: https://issues.apache.org/jira/browse/HIVE-2671 Project: Hive Issue Type: Bug Components: UDF Reporter: Dmytro Molkov When running a query that uses GenericUDTFJSONTuple there is a chance to hit a very nasty bug. If the write pipeline fails the task will not detect this and will simply start skipping all the rows in the input. The UDTF has a catch (Throwable) that catches an IOException and forwards null rows, which my guess is are filtered out by the filter operator down the line so the map task never tries to write them out. This happens for every row in the input. as a result the query runs forever since it produces a log message for every row (we've seen tasks run for 20 hours instead of 20 minutes) This is a stack trace of one of the tasks just in case: at org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native Method) at org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315) - locked <0x000000009c174f78> (a org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor) at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76) at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) - locked <0x000000009c18d4f8> (a java.io.BufferedOutputStream) at java.io.DataOutputStream.write(DataOutputStream.java:90) - locked <0x000000009c18d4d8> (a java.io.DataOutputStream) at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:894) at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:875) at org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112) at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.process(GenericUDTFJSONTuple.java:167) at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:368) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309) at org.apache.hadoop.mapred.Child.main(Child.java:162) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira