opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aliaksandr Autayeu <>
Subject Re: minutes of the skype call on Similarity component
Date Wed, 28 Mar 2012 21:02:24 GMT
Hi Boris,

Thank you!

One small note on "b) improve cacheing. Now it is implemented via java
object serialization; make it via CSV files".
If you'll use some library for CSV, you might as well think about Google
Protocol Buffers. They are pretty fast.


On Wed, Mar 28, 2012 at 10:42 PM, Boris Galitsky <>wrote:

> Hi guys
>  per Aliaksandr's suggestion, below are the minutes of our conversation
> with Jorn about Similarity component and other related issues
> 1) Prepare Similarity fro release from sandbox:
>      a) improve readme.txt, add 'The entry point to
> Similarity component is
> SentencePairMatchResult matchRes =
> sm.assessRelevance(sentence1,sentence2);
> where matchRes includes the similarity score (weighted number  of common
> terms) and the set of maximum
> common parse trees.
>      b) improve cacheing. Now it is implemented via java object
> serialization; make it via CSV files
>      c) proper location for cache files and resources:      joernkottmann:
> src/test/resources      d) verify porter stemmer (remove lucene
> dependecies, remove porter stemmer from /similarity      e)re-format code,
> use eclipse template for re-format          joernkottmann:
>      f) package into
> separate jar/ src using Maven
>  2) Next major feature of Similarity: taxonomy auto learning and using
> taxonomy to improve search relevance      a)  see how Similarity component
> can help with search tasks'      b) integration with SOLR
> (compare/complement of Grant Ingersoll with
> Similarity). there are some  JIRA issue opened for hooking in some of
> tamingtext  stuff to the analyzers modules in Solr     3) More examples and
> docs for similarity component      a) examples for finding similar news at
>                email the code which generates search query
> for news articles      b)email the link to the papers on
> joernkottmann:
>  4) Other future features/improvements for Similarity      a) how can we
> create a more accurate Parse object running chunker separately and then
> applying alignment algorithm      b) Coreference component
> joernkottmann: TreebankNameFinder      c) apply machine learning to parse
> trees + coreferences. " parse forest": is it a   good name?
>  joernkottmann: CorefSample.
> RegardsBoris

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message