orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Re: Complex types in hive-orc
Date Fri, 22 Jul 2016 21:14:45 GMT
Hi Matt,

On Fri, Jul 22, 2016 at 1:21 PM, Matt Burgess <mattyb149@apache.org> wrote:

> All,
>
> Is this the right place to ask questions about hive-orc? I know it was
> split out into Apache ORC, and up until recently I have been using
> Apache ORC 1.1.2 to convert Avro files to ORC files, but I was told I
> need a version that works with only Hive 1.2.1.
>

This works great, although most of the ORC developers read both.


> - Are complex types (list, map, struct, union, etc.) supported in
> hive-orc 1.2.1? I don't see the ListColumnVector and such types.



Before HIVE-12159, which went into Hive 2.1, the only way to read complex
types was to use the row by row API.


> I
> can't bring in that storage-api-2.1.1-pre-orc JAR because of a
> conflict with BloomFilter, etc.
>

How bad is the breakage? Can we fix it with a patch to ORC?


>
> - I was using VectorizedRowBatch to write my values in ORC 1.1.2, is
> that the correct/recommended approach in 1.2.1? I see Apache Crunch
> uses lots of MapReduce types but I would really like to limit the MR
> dependencies if possible since my app will not always be on a Hadoop
> node.
>

Yes, the ORC MapReduce shim uses the VectorizedRowBatch and converts them
into WritableComparables so it will be fastest if you use
VectorizedRowBatch directly. Although as you have discovered that won't
work if you are trying to use hive-orc 1.2


> - Are there any examples of converting Avro to ORC outside of Hive
> (but using Avro and hive-orc)? I see a couple of examples of
> reading/writing ORC files but nothing with Avro. No worries if not, I
> am writing one as part of this effort :)
>

If you look at the benchmarking code in
https://github.com/apache/orc/pull/43 , you'll see that I took a first stab
at making an Avro writer that goes from ORC's TypeDescription and a
VectorizedRowBatch.

.. Owen


>
> Thank you in advance,
> Matt
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message