hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From helenahm <...@git.apache.org>
Subject [GitHub] incubator-hivemall pull request #82: Encoding related bug in LDA.
Date Fri, 02 Jun 2017 05:37:33 GMT
GitHub user helenahm opened a pull request:

    https://github.com/apache/incubator-hivemall/pull/82

    Encoding related bug in LDA.

    What changes were proposed in this pull request?
    
    I have found a major bug, without fixing it people will not be able to use LDA, or perhaps
other algorithms too.
    
    I spotted the bug and made a small fix, I would like you to fix the rest.
    
    What type of PR is it?
    
    [Bug Fix]
    
    What is the Jira issue?
    
    ???
    
    How was this patch tested?
    
    I intended to use LDA for my data, on EMR as usually, LDA failed to process my text. So
I checked your test, and when added code to add my data to your test instead of your two lines.
When it run successfully, I realized that the test may be faulty, and indeed, I think close()
is called in real life, but not in the test. Same errors as on EMR showed up.
    
    I found the lines in features that coursed the errors:
     na‹ve:1
     xž:1
    
    Why I do not know, but that means that I have to pre-process data prior to testing LDA
further, plus I start doubting whether it will work for other languages.
    
    EMR error messages for different options for memory and number of reduces are below. Same
source, same reason.
    
    Diagnostic Messages for this Task:
     Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: Exception
caused in the iterative training
     at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:422)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
     Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception caused in the
iterative training
     at hivemall.topicmodel.LDAUDTF.runIterativeTraining(LDAUDTF.java:511)
     at hivemall.topicmodel.LDAUDTF.close(LDAUDTF.java:309)
     at org.apache.hadoop.hive.ql.exec.UDTFOperator.closeOp(UDTFOperator.java:152)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:683)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
     at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:279)
     ... 7 more
     Caused by: java.lang.OutOfMemoryError: Java heap space
     at hivemall.topicmodel.LDAUDTF.runIterativeTraining(LDAUDTF.java:352)
     ... 13 more
    
    Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: Exception
caused in the iterative training
     at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:422)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
     Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception caused in the
iterative training
     at hivemall.topicmodel.LDAUDTF.runIterativeTraining(LDAUDTF.java:511)
     at hivemall.topicmodel.LDAUDTF.close(LDAUDTF.java:309)
     at org.apache.hadoop.hive.ql.exec.UDTFOperator.closeOp(UDTFOperator.java:152)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:683)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
     at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:279)
     ... 7 more
     Caused by: java.nio.BufferUnderflowException
     at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:271)
     at java.nio.ByteBuffer.get(ByteBuffer.java:715)
     at hivemall.topicmodel.LDAUDTF.runIterativeTraining(LDAUDTF.java:356)
     ... 13 more
    
    Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: Exception
caused in the iterative training
     at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
     at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:454)
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:393)
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:422)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
     Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception caused in the
iterative training
     at hivemall.topicmodel.LDAUDTF.runIterativeTraining(LDAUDTF.java:511)
     at hivemall.topicmodel.LDAUDTF.close(LDAUDTF.java:309)
     at org.apache.hadoop.hive.ql.exec.UDTFOperator.closeOp(UDTFOperator.java:152)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:683)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:697)
     at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:279)
     ... 7 more
     Caused by: java.lang.NegativeArraySizeException
     at hivemall.topicmodel.LDAUDTF.runIterativeTraining(LDAUDTF.java:352)
     ... 13 more
    
    How to use this feature?
    
    When fixed people will be able to run LDA.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/helenahm/incubator-hivemall master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-hivemall/pull/82.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #82
    
----
commit 45a656aa7278066ce3fc36fcd81fb1eca11f1079
Author: helenahm <helenahm@users.noreply.github.com>
Date:   2017-06-02T05:10:13Z

    Update LDAUDTFTest.java

commit fef9c1ce719d3924a28cc90d71d40728dc5c7563
Author: helenahm <helenahm@users.noreply.github.com>
Date:   2017-06-02T05:22:54Z

    Merge pull request #1 from helenahm/helenahm-patch-1
    
    Update LDAUDTFTest.java

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message