opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Galitsky <bgalit...@hotmail.com>
Subject current version of source for syntactic match / relevance component
Date Wed, 17 Aug 2011 23:17:04 GMT


Hello


attached are three packages which is our current version of our proposed contribution of syntactic
match / text relevance component for openNLP.
To start looking at it, please go to SyntMatcherTest.java and see the results how commonality
between sentences are computed.Then you can go to ParseTreeChunkTest.java and see how the
operation of syntactic generalization is applied to particular chunks.
As an application, we selected the problem of content generation when relevance is critical.Please
go to "RelatedSentenceFinder" and see which sentences might serve as  seeds for content generation.The
system goes on the web and finds somewhat relevant sentences to the seed ones and tries to
"write an article".
As examples of auto-generated articles using this technology please seehttp://www.allvoices.com/contributed-news/9423860-best-things-to-do-in-san-francisco-jazz-and-blues-festivalhttp://www.allvoices.com/contributed-news/9415063-britney-spears-femme-fatale-in-north-sf-bay-areahttp://www.allvoices.com/contributed-news/9381803-cirque-du-soleil-quidamThis
articles were generated using this class RelatedSentenceFinder.java
Hence the proposed structure of our contribution:
package opennlp.tools.similarity, main and test: implementation of syntactic matchpackage
opennlp.tools.similarity.apps: the content generation app leveraging syntactic match for sentence-level
similaritypackage opennlp.tools.similarity.apps.utils: utils for the above.
What we needs to be done before full consideration for contribution can be done:1) make it
use latest openNLP (now it is using a modified version of 2008's openNLP, although pretty
stable, working for 2 years in industrial settings)2) fix all tests, add more tests3) clean
the implementation and application code4) add more applications to show more working scenarios
of syntactic match5) in addition to academic papers, have better docs for developers
RegardsBoris

 		 	   		  
Mime
View raw message