hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brock Noland (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7868) AvroSerDe error handling could be improved
Date Mon, 08 Sep 2014 18:33:32 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125879#comment-14125879
] 

Brock Noland commented on HIVE-7868:
------------------------------------

This looks good! Using the following tables:


{noformat}
create table test_avro (c1 string, c2 char(10), c3 varchar(10))
 ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.literal'='{"type":"record","name":"test_avro","namespace":"default","fields":[{"name":"c1","type":["null","string"],"default":null},{"name":"c2","type":["null","string"],"default":null},{"name":"c3","type":["null","string"],"default":null}]}');

create table test_avro (c1 string, c2 char(10), c3 varchar(10))
 ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='hdfs://localhost:9000/tmp/schema.avsc');
{noformat}

* Creating with bad avro.schema.literal:
{noformat}
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered exception determining
schema. Returning signal schema to indicate problem: No type: {})
{noformat}

* Creating with bad avro.schema.url:
{noformat}
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException:
MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered AvroSerdeException
determining schema. Returning signal schema to indicate problem: Unable to read schema from
given path: hdfs://localhost:8020/tmp/schema.avsc)
{noformat}

* Setting bad avro.schema.url:
{noformat}
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.SerDeException
Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem:
Unable to read schema from given path: hdfs://localhost:9000/tmp/schema.avsc
{noformat}

* Setting bad avro.schema.literal
{noformat}
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.hive.serde2.SerDeException
Encountered exception determining schema. Returning signal schema to indicate problem: java.io.EOFException:
No content to map to Object due to end of input
{noformat}

* Fixing bad URL schema works.
* Fixing bad literal schema works.

I think we should do one more item, in the describe table code here:

https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3063

we should check for config errors and print them as opposed to describing the table.

Thanks!!

> AvroSerDe error handling could be improved
> ------------------------------------------
>
>                 Key: HIVE-7868
>                 URL: https://issues.apache.org/jira/browse/HIVE-7868
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Brock Noland
>            Assignee: Ferdinand Xu
>         Attachments: HIVE-7868.patch
>
>
> When an Avro schema is invalid, AvroSerDe returns an error message instead of throwing
an exception. This is described in {{AvroSerdeUtils.determineSchemaOrReturnErrorSchema}}:
> {noformat}
>   /**
>    * Attempt to determine the schema via the usual means, but do not throw
>    * an exception if we fail.  Instead, signal failure via a special
>    * schema.  This is used because Hive calls init on the serde during
>    * any call, including calls to update the serde properties, meaning
>    * if the serde is in a bad state, there is no way to update that state.
>    */
> {noformat}
> I believe we should find a way to provide a better experience to our users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message