hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sushanth Sowmyan (JIRA)" <>
Subject [jira] [Commented] (HIVE-4329) HCatalog should use getHiveRecordWriter rather than getRecordWriter
Date Fri, 15 Aug 2014 22:15:19 GMT


Sushanth Sowmyan commented on HIVE-4329:


I'm against the goal of this patch requirement altogether, and this patch effectively breaks
one of the core reasons for the existence of HCatalog, to be a generic wrapper for underlying
mapreduce IF/OFs, for consumers that expect mapreduce IF/OFs. I apologize for not having spotted
this jira earlier, since it seems a lot of work has gone into this, and I understand that
there is an impedance mismatch here between HiveOutputFormat and OutputFormat, and one we
want to fix, but this fix is in the opposite direction of the desired way of solving that
impedance mismatch.

One of the longer term goals, for us, has been to try to evolve Hive's usage of StorageHandlers
to a point where Hive stops using HiveRecordWriter/HiveOutputFormat altogether, so that there
is no notion of an "internal" and "external" OutputFormat definition, so that third party
mapreduce IF/OFs can directly be integrated into Hive, instead of having to change them to

The primary issue discussed in this problem, that of FileRecordWriterContainer writing out
a NullComparable is something that's solvable, since FileRecordWritableContainer's key format
is a WritableComparable, and if AvroContainerOutputFormat does not already care about the
key anyway, we should be ignoring it. If it's simpler, I would also be in favour of a hack
like the FileRecordWriterContainer emiting a LongWritable in that case if it detects it's
wrapping an AvroContainerOutputFormat instead of rewiring HCatalog to make it based on HiveOutputFormat.

> HCatalog should use getHiveRecordWriter rather than getRecordWriter
> -------------------------------------------------------------------
>                 Key: HIVE-4329
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog, Serializers/Deserializers
>    Affects Versions: 0.14.0
>         Environment: discovered in Pig, but it looks like the root cause impacts all
non-Hive users
>            Reporter: Sean Busbey
>            Assignee: David Chen
>         Attachments: HIVE-4329.0.patch
> Attempting to write to a HCatalog defined table backed by the AvroSerde fails with the
following stacktrace:
> {code}
> java.lang.ClassCastException: cannot be cast to
> 	at$1.write(
> 	at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(
> 	at org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(
> 	at org.apache.hcatalog.pig.HCatBaseStorer.putNext(
> 	at org.apache.hcatalog.pig.HCatStorer.putNext(
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(
> 	at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(
> 	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(
> {code}
> The proximal cause of this failure is that the AvroContainerOutputFormat's signature
mandates a LongWritable key and HCat's FileRecordWriterContainer forces a NullWritable. I'm
not sure of a general fix, other than redefining HiveOutputFormat to mandate a WritableComparable.
> It looks like accepting WritableComparable is what's done in the other Hive OutputFormats,
and there's no reason AvroContainerOutputFormat couldn't also be changed, since it's ignoring
the key. That way fixing things so FileRecordWriterContainer can always use NullWritable could
get spun into a different issue?
> The underlying cause for failure to write to AvroSerde tables is that AvroContainerOutputFormat
doesn't meaningfully implement getRecordWriter, so fixing the above will just push the failure
into the placeholder RecordWriter.

This message was sent by Atlassian JIRA

View raw message