hivemall-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From helenahm <...@git.apache.org>
Subject [GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model
Date Wed, 05 Jul 2017 04:43:07 GMT
Github user helenahm commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/93
  
    I will do more tests too, as I actually need the model for a project. So I plan to test
it under "load" too. I will write about the results.
    
    It may have similar issues that Random Forest has. You are right. In a nutshell the implementation
and memory concerns are similar. 
    
    The implementation is as scalable as the implementation of Random Forest: one or more
models per mapper and then a UDAF that combines all the learned models into one final model.
    
    I still use the Random Forest even though on EMR r4 machines _numTrees greater than 1
does not work for me for my dataset. MaxEnt though will give me a better model, I think, I
will not have to think whether there is overfitting because of the tree structure, etc.
    
    Iterative Scaling can be re-written from scratch too without using any third-party software.
This is an option too.
    
    I am sure that NLP community will more likely accept the implementation and will use it
in exactly the way those guys have written it. We very much value Adwait Ratnaparkhi's work.
Many published articles use exactly that Max Ent implementation. That means that people will
be able to use HiveMall and compare their newer results with results of their previous work.
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message