hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From takuti <>
Subject [GitHub] incubator-hivemall pull request #83: [HIVEMALL-109][HIVEMALL-112] Fix topic ...
Date Fri, 02 Jun 2017 22:24:34 GMT
GitHub user takuti opened a pull request:

    [HIVEMALL-109][HIVEMALL-112] Fix topic model and tokenize UDFs

    ## What changes were proposed in this pull request?
    - Topic mode: `train_plsa` and `train_lda`
      - Fix bugs caused by multi-byte input
      - Fix wrong `recordBytes` calculation for iteration utilizing file IO
      - Refactor and update unit tests accordingly
    - `tokenize()`
      - Support NULL input; the UDF simply returns NULL itself
    ## What type of PR is it?
    Bug Fix
    ## What is the Jira issue?
    ## How was this patch tested?
    - Unit tests
    - Manual tests on EMR

You can merge this pull request into a Git repository by running:

    $ git pull fix-topicmodel

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #83
commit 988666a58801e1cf62b0c91c5815e973084ba972
Author: Takuya Kitazawa <>
Date:   2017-06-02T07:40:12Z

    Fix multi-byte-related issue in topic model UDFs
    and validate it as unit test

commit b08f73aed98064059773ba8c2342814d03b991ff
Author: Takuya Kitazawa <>
Date:   2017-06-02T08:07:19Z

    Use `char`s instead of `byte`s

commit c1239fe7938a724147554d0c1c769ec7c3025013
Author: Takuya Kitazawa <>
Date:   2017-06-02T08:24:20Z

    Fix record bytes calculation

commit accee7a938c8034bd3c2a250bbdd27d57871092d
Author: Takuya Kitazawa <>
Date:   2017-06-02T09:15:53Z

    Use NIOUtils for writing strings to a byte buffer

commit ceff765de725cddc5e9f556433ab76272e4d9720
Author: Takuya Kitazawa <>
Date:   2017-06-02T09:52:25Z

    Fix record size related to iteration using temporary file
    Since now iteration works correctly, manual for-loops are removed from
    unit tests.

commit e9ec0f31ea2a6b5b67c89a141be197a734f66567
Author: Takuya Kitazawa <>
Date:   2017-06-02T10:06:45Z

    Fix `tokenize` for null input

commit dda972405c893277edb13add5fc2b4e7a5a96d83
Author: Takuya Kitazawa <>
Date:   2017-06-02T11:35:20Z

    Refactor on `recordTrainSampleToTempFile`


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message