hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@apache.org>
Subject Re: External Table with unclosed orc files.
Date Wed, 15 Apr 2015 08:07:20 GMT


Grant Overby (groverby) wrote:
> Thanks for the link to the hive streaming bolt. We rolled our own bolt
> many moons ago to utilize hive streaming. We’ve tried it against 0.13 and
> 0.14 . Acid tables have been a real pain for us. We don’t believe they are
> production ready. At least in our use cases, Tez crashes for assorted
> reasons or only assigns 1 mapper to the partition. Having delta files and
> no base files borks mapper assignments.  Files containing flush in their
> name are left scattered about, borking queries. Latency is higher with
> streaming than writing to an orc file in hdfs, forcing obscene quantities
> of buckets and orc files smaller than any reasonable orc stripe / hdfs
> block size. The compactor hangs seemingly at random for no reason we’ve
> been able to discern.
The issues with flush files borking queries has been resolved in Hive
1.0. I haven't seen any issues with the compactor hanging at random.
Could you expand on what parts hung? If you have a reproducible case it
would be great to file a JIRA so we can fix it.

Alan.
>
>
>
> An orc file without a footer is junk data (or, at least, the last stripe
> is junk data). I suppose my question should have been 'what will the hive
> query do when it encounters this? Skip the stripe / file? Error out the
> query? Something else?’
>
>
>
>
> Grant Overby
> Software Engineer
> Cisco.com <http://www.cisco.com/>
> groverby@cisco.com
> Mobile: 865 724 4910
>
>
>
>
>  Think before you print.This email may contain confidential and privileged
> material for the sole use of the intended recipient. Any review, use,
> distribution or disclosure by others is strictly prohibited. If you are
> not the intended recipient (or authorized to receive for the recipient),
> please contact the sender by reply email and delete all copies of this
> message.
> Please click here 
> <http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for
> Company Registration Information.
>
>
>
>
>
>
>
> On 4/14/15, 4:23 PM, "Gopal Vijayaraghavan" <gopalv@apache.org> wrote:
>
>>> What will Hive do if querying an external table containing orc files
>>> that are still being written to?
>> Doing that directly won¹t work at all. Because ORC files are only readable
>> after the Footer is written out, which won¹t be for any open files.
>>
>>> I won¹t be able to test these scenarios till tomorrow and would like to
>>> have some idea of what to expect this afternoon.
>> If I remember correctly, your previous question was about writing ORC from
>> Storm.
>>
>> If you¹re on a recent version of Storm, I¹d advise you to look at
>> storm-hive/ 
>>
>> https://github.com/apache/storm/tree/master/external/storm-hive
>>
>>
>> Or alternatively, there¹s a ³hortonworks trucking demo² which does a
>> partition insert instead.
>>
>> Cheers,
>> Gopal
>>
>>
>

Mime
View raw message