orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@gmail.com>
Subject Re: Complex types in hive-orc
Date Fri, 22 Jul 2016 21:19:24 GMT
Thanks for the great info! For the BloomFilter thing, the first/only thing I saw was addBytes()
also needs a start and length param in 1.2.1 but later version take just the column vector
and something else as params. Not sure if there are other issues with duplicate classes and
such.

Regards,
Matt


> On Jul 22, 2016, at 5:14 PM, Owen O'Malley <omalley@apache.org> wrote:
> 
> Hi Matt,
> 
>> On Fri, Jul 22, 2016 at 1:21 PM, Matt Burgess <mattyb149@apache.org> wrote:
>> 
>> All,
>> 
>> Is this the right place to ask questions about hive-orc? I know it was
>> split out into Apache ORC, and up until recently I have been using
>> Apache ORC 1.1.2 to convert Avro files to ORC files, but I was told I
>> need a version that works with only Hive 1.2.1.
> 
> This works great, although most of the ORC developers read both.
> 
> 
>> - Are complex types (list, map, struct, union, etc.) supported in
>> hive-orc 1.2.1? I don't see the ListColumnVector and such types.
> 
> 
> 
> Before HIVE-12159, which went into Hive 2.1, the only way to read complex
> types was to use the row by row API.
> 
> 
>> I
>> can't bring in that storage-api-2.1.1-pre-orc JAR because of a
>> conflict with BloomFilter, etc.
> 
> How bad is the breakage? Can we fix it with a patch to ORC?
> 
> 
>> 
>> - I was using VectorizedRowBatch to write my values in ORC 1.1.2, is
>> that the correct/recommended approach in 1.2.1? I see Apache Crunch
>> uses lots of MapReduce types but I would really like to limit the MR
>> dependencies if possible since my app will not always be on a Hadoop
>> node.
> 
> Yes, the ORC MapReduce shim uses the VectorizedRowBatch and converts them
> into WritableComparables so it will be fastest if you use
> VectorizedRowBatch directly. Although as you have discovered that won't
> work if you are trying to use hive-orc 1.2
> 
> 
>> - Are there any examples of converting Avro to ORC outside of Hive
>> (but using Avro and hive-orc)? I see a couple of examples of
>> reading/writing ORC files but nothing with Avro. No worries if not, I
>> am writing one as part of this effort :)
> 
> If you look at the benchmarking code in
> https://github.com/apache/orc/pull/43 , you'll see that I took a first stab
> at making an Avro writer that goes from ORC's TypeDescription and a
> VectorizedRowBatch.
> 
> .. Owen
> 
> 
>> 
>> Thank you in advance,
>> Matt
>> 

Mime
View raw message