opennlp-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Piliouras (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OPENNLP-78) NameFinder and Dictionary Integration
Date Wed, 14 Mar 2012 18:02:39 GMT

    [ https://issues.apache.org/jira/browse/OPENNLP-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229441#comment-13229441
] 

Jim Piliouras commented on OPENNLP-78:
--------------------------------------

Actually, it sounds a lot easier if one has the option  to use the dictionary post model creation
just for evaluation purposes (as James stated) without touching the feature generation. It
does sound significantly less work but will boost the results of people who actually have
dictionaries (like me). I do get the point about training more on surrounding tokens but again
you can never be sure what to expect from a corpus. Sometimes it might be good sometimes it
might be bad...For example i'm dealing with drug names that exhibit very strong morphological
characteristics most of the time. Some of them are so strong and unique that you can find
them using regex. This leads to very informative features doesn't it? That is why i'm getting
such good results, in spite of not having the recommended amount for training (i only have
3,800 sentences). I guess learning most features from the entity itself works really well
for me but what would happen if was looking for person names with such little training data?
I really wonder...I can see that your pre-trained model for names is 5MB whereas my drug model
is only 387Kb and still gets precision 94% and recall 73%. Anyway i vote for using the dictionary
 after deploying the maxent model for the sake of better results when evaluating...

Hope I didn't bore you!

Jim 
                
> NameFinder and Dictionary Integration
> -------------------------------------
>
>                 Key: OPENNLP-78
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-78
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Name Finder
>         Environment: Windows 7
>            Reporter: James Kosin
>            Assignee: James Kosin
>            Priority: Minor
>
> Now that we have a NameFinder Dictionary and improved NameFinder tools; it would be nice
to be able to integrate the dictionary and model to help improve the finding of names.
> This way, the name finder could be trained more on the surrounding text instead of attempting
to memorize common names in the news that occur frequently.
> I've already got the name finder corpus, created the dictionaries with the data from
the US Census.
> I just need to implement some method to help train the model; or be able to use the dictionaries
post model creation to help with the finding of names.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message