tez-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajesh Balamohan (Jira)" <j...@apache.org>
Subject [jira] [Commented] (TEZ-4271) Add config to limit desiredNumSplits
Date Tue, 26 Jan 2021 05:28:00 GMT

    [ https://issues.apache.org/jira/browse/TEZ-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17271883#comment-17271883
] 

Rajesh Balamohan commented on TEZ-4271:
---------------------------------------

{{tez.grouping.split-count}} mainly helps in initializing "desired number of splits" to the
requested value set in the config. It tries to approximate the number of splits to the requested
value (when original splits is higher than desired number of splits). It is not hard bound
to generate exactly the same split count.

Have you tried adjusting tez.grouping.min/max-size instead to control the number of mappers
being spun up?

> Add config to limit desiredNumSplits
> ------------------------------------
>
>                 Key: TEZ-4271
>                 URL: https://issues.apache.org/jira/browse/TEZ-4271
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Attila Magyar
>            Assignee: Attila Magyar
>            Priority: Major
>
> There are multiple config parameters (like tez.grouping.min/max-size, tez.grouping.by-length, tez.grouping.by-count,
> tez.grouping.node.local.only) that impacts the number of grouped input splits but there
is no single property for setting an exact top limit on the desired count.
> In Hive the max number of buckets is 4095. During an insert overwrite each tasks writes
its own bucket and when TEZ runs more than 4095 tasks Hive fails with a bucketId out of range
exception.
>  
> When "tez.grouping.by-count" is used then clamping the desiredNumSplits would be easy.
However when "tez.grouping.by-length" is enabled (which is the default) clamping desiredNumSplits
is not enough since TEZ might generate a few more splits than the desired.
> For example:
>  * originalSplits: [10, 10, 10, 10, 10, 10, 10, 10, 10, 10] where the first 5 is on node0
the other 5 is on node1.
>  * desiredNumSplits: 4
>  * Total size: 100
>  * lengthPerGroup: 100 / 4 = 25
>  * group0: [node0=>10, node0=>10]
>  * group1: [node1=>10, node1=>10]
>  * group2: [node0=>10, node0=>10]
>  * group2: [node1=>10, node1=>10]
>  * group4: default-rack=>[node0=>10, node1=>10]
>  
> The lengthPerGroup prevents adding more than 2 splits into the group resulting 5 groups
instead of the 4 desired.
>  
> If 25 was rounded up to 30 (lengthPerGroup = ceil(25 / 10) * 10) it would generate 3.
But we can't assume all splits have the same size (?)
> We might need to detect if groupedSplits.size() is greater than desired in the loop,
and redistribute the remaining splits across the existing groups (either in a round robin
fashion or by selecting the smallest), instead of creating new groups. This might cause existing
groups to be converted rackLocal groups if the node locality of the remaining is different
then locality of the existing groups.
> Or doing a second pass after groupedSplits is fully calculated and trying to merge existing
groups. Either way this complicates the logic even further. At this point I'm not sure what
would be the best. [~rajesh.balamohan], [~t3rmin4t0r] do you have any suggestions?
> {code:java}
> Error while compiling statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask.
Vertex failed, vertexName=Map 1, vertexId=vertex_1610498854304_0004_1_00, diagnostics=[Task
failed, taskId=task_1610498854304_0004_1_00_004098, diagnostics=[TaskAttempt 0 failed, info=[Error:
Error while running task ( failure ) : attempt_1610498854304_0004_1_00_004098_0:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException:
bucketId out of range: 4098 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of range: 4098 at
org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
... 26 more ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : attempt_1610498854304_0004_1_00_004098_1:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException:
bucketId out of range: 4098 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of range: 4098 at
org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
... 26 more ], TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) : attempt_1610498854304_0004_1_00_004098_2:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException:
bucketId out of range: 4098 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of range: 4098 at
org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
... 26 more ], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1610498854304_0004_1_00_004098_3:java.lang.RuntimeException:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime
Error while processing row at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:75) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:62)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:62)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:38)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)
at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:101)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:76)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:437)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267)
... 15 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
while processing row at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:573)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:92)
... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException:
bucketId out of range: 4098 at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:820)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:995) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:938)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:174) at
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:152) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
... 19 more Caused by: java.lang.IllegalArgumentException: bucketId out of range: 4098 at
org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94) at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:270)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:289)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:352)
at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:338)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:883)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:814)
... 26 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:3645,
Vertex vertex_1610498854304_0004_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex
killed, vertexName=Reducer 2, vertexId=vertex_1610498854304_0004_1_01, diagnostics=[Vertex
received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE,
failedTasks:0 killedTasks:1, Vertex vertex_1610498854304_0004_1_01 [Reducer 2] killed/failed
due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
{code}
>  
> cc: [~abstractdog], [~ashutoshc]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message