orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Abeler <tho...@sensenetworks.com>
Subject Re: OrcInputFormat vs. OrcNewInputFormat
Date Thu, 30 Jul 2015 15:23:54 GMT
Since hive 13 (we're running hive12), you have the ability to use a
vectorized execution engine, which processes, as against the normal
execution engine, 1024 instead of 1 row at a time.

Seems like you need an extra Orc-Vectorized-Format to make use of it.

On Wed, Jul 29, 2015 at 8:06 PM, David Rosenstrauch <darose@darose.net>
wrote:

> On 07/23/2015 12:01 PM, David Rosenstrauch wrote:
>
>> Just wondering what's the difference between these 2 classes.  Is there
>> a guideline as to when we should use one vs. the other?
>>
>> Thanks,
>>
>> DR
>>
>
> Had a follow-up question along the same lines:
>
> What's VectorizedOrcInputFormat?
>
>
> Also, a couple of other things I'm mulling over as we get a bit deeper
> into our work with ORC:
>
> * In the docs it states "Seek to row number is implemented to support
> secondary indexes".  (See:
> http://hive.apache.org/javadocs/r0.13.1/api/ql/org/apache/hadoop/hive/ql/io/orc/package-summary.html)
> A colleague and I are working on this exact use case (secondary index).
> And we were under the impression that we had to create our own row
> numbering scheme to support the secondary index.  Does ORC already write a
> row number on each record?  If so, how is that accessed?
>
> * We're thinking over how to structure our secondary index.  And although
> we can envision an ORC-based structure that would provide the functionality
> we need, it'd be a bit clunky/complex/verbose to query using Hive.  I was
> thinking perhaps it might be an option for us to implement a layer in front
> of ORC that hides some of the complexity of how the secondary index is
> physically structured, and makes it possible to query it using simple HQL.
> I know that Hive allows developers to use a custom InputFormat to implement
> custom storage formats.  So theoretically we could write a wrapper around
> OrcNewInputFormat and/or OrcSerDe to provide the functionality we're
> looking for.  Any suggestions or pointers to someone looking to go this
> route?  (I.e., specific code we might look at?  Where we might want to
> insert our own code?  Etc.)
>
> Thanks!
>
> DR
>

Mime
View raw message