orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Burgess <mattyb...@gmail.com>
Subject Re: Complex types in hive-orc
Date Mon, 25 Jul 2016 14:19:15 GMT
Ok looks like I'll need to go with the row-by-row API. Just to make
sure I understand correctly, is that the approach Apache Crunch is
using? With ObjectInspectors, Writables / POJOs. etc?

https://github.com/apache/crunch/blob/master/crunch-hive/src/main/java/org/apache/crunch/types/orc/OrcUtils.java

If not, what is considered the row-by-row API (not using
VectorizedRowBatch or ColumnVectors)?

Thanks again,
Matt

On Fri, Jul 22, 2016 at 5:14 PM, Owen O'Malley <omalley@apache.org> wrote:
> Hi Matt,
>
> On Fri, Jul 22, 2016 at 1:21 PM, Matt Burgess <mattyb149@apache.org> wrote:
>
>> All,
>>
>> Is this the right place to ask questions about hive-orc? I know it was
>> split out into Apache ORC, and up until recently I have been using
>> Apache ORC 1.1.2 to convert Avro files to ORC files, but I was told I
>> need a version that works with only Hive 1.2.1.
>>
>
> This works great, although most of the ORC developers read both.
>
>
>> - Are complex types (list, map, struct, union, etc.) supported in
>> hive-orc 1.2.1? I don't see the ListColumnVector and such types.
>
>
>
> Before HIVE-12159, which went into Hive 2.1, the only way to read complex
> types was to use the row by row API.
>
>
>> I
>> can't bring in that storage-api-2.1.1-pre-orc JAR because of a
>> conflict with BloomFilter, etc.
>>
>
> How bad is the breakage? Can we fix it with a patch to ORC?
>
>
>>
>> - I was using VectorizedRowBatch to write my values in ORC 1.1.2, is
>> that the correct/recommended approach in 1.2.1? I see Apache Crunch
>> uses lots of MapReduce types but I would really like to limit the MR
>> dependencies if possible since my app will not always be on a Hadoop
>> node.
>>
>
> Yes, the ORC MapReduce shim uses the VectorizedRowBatch and converts them
> into WritableComparables so it will be fastest if you use
> VectorizedRowBatch directly. Although as you have discovered that won't
> work if you are trying to use hive-orc 1.2
>
>
>> - Are there any examples of converting Avro to ORC outside of Hive
>> (but using Avro and hive-orc)? I see a couple of examples of
>> reading/writing ORC files but nothing with Avro. No worries if not, I
>> am writing one as part of this effort :)
>>
>
> If you look at the benchmarking code in
> https://github.com/apache/orc/pull/43 , you'll see that I took a first stab
> at making an Avro writer that goes from ORC's TypeDescription and a
> VectorizedRowBatch.
>
> .. Owen
>
>
>>
>> Thank you in advance,
>> Matt
>>

Mime
View raw message