hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmytro Molkov (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-2671) GenericUDTFJSONTuple ignores IOExceptions
Date Wed, 21 Dec 2011 19:57:31 GMT
GenericUDTFJSONTuple ignores IOExceptions
-----------------------------------------

                 Key: HIVE-2671
                 URL: https://issues.apache.org/jira/browse/HIVE-2671
             Project: Hive
          Issue Type: Bug
          Components: UDF
            Reporter: Dmytro Molkov


When running a query that uses GenericUDTFJSONTuple there is a chance to hit a very nasty
bug.
If the write pipeline fails the task will not detect this and will simply start skipping all
the rows in the input.

The UDTF has a catch (Throwable) that catches an IOException and forwards null rows, which
my guess is are filtered out by the filter operator down the line so the map task never tries
to write them out.

This happens for every row in the input.
as a result the query runs forever since it produces a log message for every row (we've seen
tasks run for 20 hours instead of 20 minutes)

This is a stack trace of one of the tasks just in case:
at org.apache.hadoop.io.compress.zlib.ZlibCompressor.deflateBytesDirect(Native Method)
at org.apache.hadoop.io.compress.zlib.ZlibCompressor.compress(ZlibCompressor.java:315)
- locked <0x000000009c174f78> (a org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor)
at org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:76)
at org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:71)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
- locked <0x000000009c18d4f8> (a java.io.BufferedOutputStream)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
- locked <0x000000009c18d4d8> (a java.io.DataOutputStream)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.flushRecords(RCFile.java:894)
at org.apache.hadoop.hive.ql.io.RCFile$Writer.append(RCFile.java:875)
at org.apache.hadoop.hive.ql.io.RCFileOutputFormat$2.write(RCFileOutputFormat.java:140)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:592)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112)
at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81)
at org.apache.hadoop.hive.ql.udf.generic.GenericUDTFJSONTuple.process(GenericUDTFJSONTuple.java:167)
at org.apache.hadoop.hive.ql.exec.UDTFOperator.processOp(UDTFOperator.java:98)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.LateralViewForwardOperator.processOp(LateralViewForwardOperator.java:37)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:143)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:368)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
at org.apache.hadoop.mapred.Child.main(Child.java:162)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message