hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Overby (groverby)" <grove...@cisco.com>
Subject Re: External Table with unclosed orc files.
Date Wed, 15 Apr 2015 15:05:40 GMT
It wasn’t reliably reproducible for us. If we killed the compaction job in yarn and manually
triggered compaction for the same partition, it would succeed. We would see this about 1 time
every 2 days / 200 partitions. There weren’t any errors logged that we noticed. The job
was simply sitting there making no progress.

I’m not using acid tables currently, but I’ll likely give it another go. What information
should I capture to help with this issue?





From: Alan Gates <gates@apache.org<mailto:gates@apache.org>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>,
"gates@apache.org<mailto:gates@apache.org>" <gates@apache.org<mailto:gates@apache.org>>
Date: Wednesday, April 15, 2015 at 4:07 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Re: External Table with unclosed orc files.



Grant Overby (groverby) wrote:

Thanks for the link to the hive streaming bolt. We rolled our own bolt
many moons ago to utilize hive streaming. We’ve tried it against 0.13 and
0.14 . Acid tables have been a real pain for us. We don’t believe they are
production ready. At least in our use cases, Tez crashes for assorted
reasons or only assigns 1 mapper to the partition. Having delta files and
no base files borks mapper assignments.  Files containing flush in their
name are left scattered about, borking queries. Latency is higher with
streaming than writing to an orc file in hdfs, forcing obscene quantities
of buckets and orc files smaller than any reasonable orc stripe / hdfs
block size. The compactor hangs seemingly at random for no reason we’ve
been able to discern.

The issues with flush files borking queries has been resolved in Hive 1.0.  I haven't seen
any issues with the compactor hanging at random.  Could you expand on what parts hung?  If
you have a reproducible case it would be great to file a JIRA so we can fix it.

Alan.


An orc file without a footer is junk data (or, at least, the last stripe
is junk data). I suppose my question should have been 'what will the hive
query do when it encounters this? Skip the stripe / file? Error out the
query? Something else?’




Grant Overby
Software Engineer
Cisco.com <http://www.cisco.com/><http://www.cisco.com/>groverby@cisco.com<mailto:groverby@cisco.com>
Mobile: 865 724 4910




 Think before you print.This email may contain confidential and privileged
material for the sole use of the intended recipient. Any review, use,
distribution or disclosure by others is strictly prohibited. If you are
not the intended recipient (or authorized to receive for the recipient),
please contact the sender by reply email and delete all copies of this
message.
Please click here
<http://www.cisco.com/web/about/doing_business/legal/cri/index.html><http://www.cisco.com/web/about/doing_business/legal/cri/index.html>
for
Company Registration Information.







On 4/14/15, 4:23 PM, "Gopal Vijayaraghavan" <gopalv@apache.org><mailto:gopalv@apache.org>
wrote:



What will Hive do if querying an external table containing orc files
that are still being written to?


Doing that directly won¹t work at all. Because ORC files are only readable
after the Footer is written out, which won¹t be for any open files.



I won¹t be able to test these scenarios till tomorrow and would like to
have some idea of what to expect this afternoon.


If I remember correctly, your previous question was about writing ORC from
Storm.

If you¹re on a recent version of Storm, I¹d advise you to look at
storm-hive/

https://github.com/apache/storm/tree/master/external/storm-hive


Or alternatively, there¹s a ³hortonworks trucking demo² which does a
partition insert instead.

Cheers,
Gopal



Mime
View raw message