hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <>
Subject Re: hive throws ConcurrentModificationException when executing insert overwrite table
Date Wed, 17 Aug 2016 05:18:54 GMT

> Yes, Kylin generated the query. I'm using Kylin 1.5.3.

I would report a bug to Kylin about DISTRIBUTE BY RAND().

This is what happens when a node which ran a Map task fails and the whole
task is retried.

Assume that the first attempt of the Map task0 wrote value1 into
reducer-99, because RAND() returned 99.

Now the task succeeds and then reducer starts, running reducer-0
successfully, which write 0000_0.

But before reducer-99 runs, the node which ran Map task0 crashes.

So, the engine re-runs Map task0 on another node. Except because RAND() is
completely random, it may give 0 as the output of RAND() for "value1".

The reducer-0 output from Map task0 now has "value1", except there's no
task which will ever read that out or write that out.

In short, the output of the table will not contain "value1", despite the
input and the shuffle outputs containing "value1".

I would replace the DISTRIBUTE BY RAND() with SORT BY 0, for a random
distribution without data loss.

> But I still not sure how can I fix the problem. I'm a beginner of Hive
>and Kylin, Can the problem be fixed by just change the hive or kylin

If you're just experimenting with Kylin right now, I recommend just
disabling the ACL settings in HDFS (this is not permissions btw, ACLs are

Set dfs.namenode.acls.enabled=false in core-site.xml and wherever else in
your /etc/hadoop/conf it shows up and you should be good to avoid the race


View raw message