uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kline, Larry D" <Larry.Kl...@USONCOLOGY.COM>
Subject ConceptMapper and stemming
Date Thu, 13 Dec 2012 21:32:36 GMT
I am about to try using the Porter stemmer with the ConceptMapper and
wonder if anyone has any experience with this.  Any suggestions,
caveats, etc. would be most welcome.


A couple questions:

*         I presume I will need to stem the lookup dictionary when I
build it.  Or can I do that at some other point in the pipeline?

*         I plan to use the Lucene implementation of Porter stemmer and
wrap it with a class that implements the interface required by
ConceptMapper.  Unless someone knows of a version of Porter stemmer that
already implements that interface?

*         Will I also need to stem the stop-words dictionary?

*         I see the following comment preceding the stem() method in
TokenNormalizer.  I assume this is not really true because the default
stemmer does not appear to be a Porter stemmer implementation.

   * If the stemming flag is true, then return the stemmed form of the
supplied word using the

   * Porter stemmer.

*         Is there anything else I should be aware of such as how this
might affect the search strategy?

*         Is it possible to get to the stemmed form of the word/phrase
that matched?  For instance could it be copied to the token?

*         Does anyone have experience with stemming medical terms?  I
would be running this against clinical notes typed by a physician about
a patient.  My dictionary was built from SNOMED concepts.  Will stemming
even help?  



Larry Kline

</pre>The contents of this electronic mail message and any attachments are confidential,
possibly privileged and intended for the addressee(s) only.<br>Only the addressee(s)
may read, disseminate, retain or otherwise use this message. If received in error, please
immediately inform the sender and then delete this message without disclosing its contents
to anyone.</pre>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message