hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15491) Failures are masked/swallowed in GenericUDTFJSONTuple::process().
Date Thu, 22 Dec 2016 05:01:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769070#comment-15769070
] 

Hive QA commented on HIVE-15491:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12844316/HIVE-15491.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10777 tests executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=234)
TestMiniLlapLocalCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=144)
	[vectorized_rcfile_columnar.q,vector_elt.q,explainuser_1.q,multi_insert.q,tez_dml.q,vector_bround.q,schema_evol_orc_acid_table.q,vector_when_case_null.q,orc_ppd_schema_evol_1b.q,vector_join30.q,vectorization_11.q,cte_3.q,update_tmp_table.q,vector_decimal_cast.q,groupby_grouping_id2.q,vector_decimal_round.q,tez_smb_empty.q,orc_merge6.q,vector_char_mapjoin1.q,vector_decimal_trailing.q,cte_5.q,tez_union.q,vector_decimal_2.q,columnStatsUpdateForStatsOptimizer_1.q,vector_outer_join3.q,schema_evol_text_vec_part_all_complex.q,tez_dynpart_hashjoin_2.q,auto_sortmerge_join_12.q,offset_limit.q,tez_union_multiinsert.q]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dbtxnmgr_showlocks] (batchId=71)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=93)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_4] (batchId=93)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery (batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/2687/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/2687/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-2687/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12844316 - PreCommit-HIVE-Build

> Failures are masked/swallowed in GenericUDTFJSONTuple::process().
> -----------------------------------------------------------------
>
>                 Key: HIVE-15491
>                 URL: https://issues.apache.org/jira/browse/HIVE-15491
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.1
>            Reporter: Mithun Radhakrishnan
>            Assignee: Mithun Radhakrishnan
>         Attachments: HIVE-15491.patch
>
>
> I draw your attention to the following piece of code in {{GenericUDTFJSONTuple::process()}}:
> {code:java}
>   @Override
>   public void process(Object[] o) throws HiveException {
>   ...
>         for (int i = 0; i < numCols; ++i) {
>         if (retCols[i] == null) {
>           retCols[i] = cols[i]; // use the object pool rather than creating a new object
>         }
>         Object extractObject = ((Map<String, Object>)jsonObj).get(paths[i]);
>         if (extractObject instanceof Map || extractObject instanceof List) {
>           retCols[i].set(MAPPER.writeValueAsString(extractObject));
>         } else if (extractObject != null) {
>           retCols[i].set(extractObject.toString());
>         } else {
>           retCols[i] = null;
>         }
>       }
>       forward(retCols);
>       return;
>     } catch (Throwable e) {  <================= Yikes.
>       LOG.error("JSON parsing/evaluation exception" + e);
>       forward(nullCols);
>     }
>   }
> {code}
> The error-handling here seems suspect. Judging from the error message, the intention
here seems to be to catch JSON-specific errors arising from {{MAPPER.readValue()}} and {{MAPPER.writeValueAsString()}}.
By catching {{Throwable}}, this code masks the errors that arise from the call to {{forward(retCols)}}.
> I just ran into this in production. A user with a nearly exhausted HDFS quota attempted
to use {{json_tuple}} to extract fields from json strings in his data. The data turned out
to have large record counts and the query used over 25K mappers. Every task failed to create
a {{RecordWriter}}, thanks to the exhausted quota. But the thrown exception was swallowed
in the code above. {{process()}} ignored the failure for the record and proceeded to the next
one. Eventually, this resulted in DDoS-ing the name-node.
> I'll have a patch for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message