opennlp-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boris Galitsky (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OPENNLP-413) demonstration how sensitive syntactic match is compared to bag-of-words approach
Date Mon, 19 Dec 2011 11:45:30 GMT

     [ https://issues.apache.org/jira/browse/OPENNLP-413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Boris Galitsky updated OPENNLP-413:
-----------------------------------

    Attachment: patch.OPENNLP-413.txt

two boundary cases are demonstrated in 
ParserChunker2MatcherProcessorTest :

"How to deduct rental expense from income ";
		VS  "How to deduct repair expense from rental income.";
[[ [NN-expense IN-from NN-income ],  [JJ-rental NN-* ],  [NN-income ]], [ [TO-to VB-deduct
JJ-rental NN-* ],  [VB-deduct NN-expense IN-from NN-income ]]]
MatchScore is adequate ( = 2.8) and bagOfWordsScore = 5.0 is too high

 "Way to minimize medical expense for my daughter" VS
	 "Means to deduct educational expense for my son";

[[ [JJ-* NN-expense IN-for PRP$-my NN-* ],  [PRP$-my NN-* ]], [ [TO-to VB-* JJ-* NN-expense
IN-for PRP$-my NN-* ]]]
MatchScore is adequate ( = 2.2) and bagOfWordsScore = 1.0 is too low
                
> demonstration how sensitive syntactic match is compared to bag-of-words approach
> --------------------------------------------------------------------------------
>
>                 Key: OPENNLP-413
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-413
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Similarity
>            Reporter: Boris Galitsky
>            Assignee: Boris Galitsky
>         Attachments: patch.OPENNLP-413.txt
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> per Jason's recommendation:  have you done
> > standard similarity based on the standard bag-of-words model?
> I do simple bag-of-words with its own list of stopwords and compare two approaches on
the pair of  cases:
> 1) similar words but different meaning
> 2) different words but similar meaning

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message