hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Makoto Yui (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVEMALL-199) Reduce memory usage of lda_predict
Date Mon, 23 Apr 2018 08:03:00 GMT
Makoto Yui created HIVEMALL-199:
-----------------------------------

             Summary: Reduce memory usage of lda_predict
                 Key: HIVEMALL-199
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-199
             Project: Hivemall
          Issue Type: Wish
    Affects Versions: 0.5.0
            Reporter: Makoto Yui
            Assignee: Makoto Yui
             Fix For: 0.5.2


LDA predict does not provide [@AggregationType(estimable = true)|https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/sketch/hll/ApproxCountDistinctUDAF.java#L233]
and then optimizer does not perform reduce parallelization.

And, we should revise LDAPredictUDAF to use less memory to avoid OOM.

{code}
2018-04-23 04:04:34,081 FATAL [Thread-5] org.apache.hadoop.mapred.YarnChild: Error running
child : java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.ByteBuffer.wrap(ByteBuffer.java:373)
    at org.apache.hadoop.io.Text.decode(Text.java:389)
    at org.apache.hadoop.io.Text.toString(Text.java:280)
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getString(PrimitiveObjectInspectorUtils.java:823)
    at hivemall.topicmodel.LDAPredictUDAF$Evaluator.iterate(LDAPredictUDAF.java:298)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:184)
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:641)
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:838)
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:735)
    at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:803)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
    at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.internalForward(CommonJoinOperator.java:638)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:651)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:654)
    at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:758)
    at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
    at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:311)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message