ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roberto Costumero Moreno (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CTAKES-268) Fix SentenceDetector training with updated OpenNLP API
Date Wed, 20 Nov 2013 20:24:36 GMT
Roberto Costumero Moreno created CTAKES-268:

             Summary: Fix SentenceDetector training with updated OpenNLP API
                 Key: CTAKES-268
                 URL: https://issues.apache.org/jira/browse/CTAKES-268
             Project: cTAKES
          Issue Type: Improvement
          Components: ctakes-core
    Affects Versions: 3.1, 3.2, 3.1.1
         Environment: Mac OS X
            Reporter: Roberto Costumero Moreno
             Fix For: 3.1, 3.2, 3.1.1

Fixed the problem where SentenceDetector did not work as expected due to changes in the OpenNLP

I have changed code around line 300:

logger.error("Need to update yet for OpenNLP changes "); // TODO
logger.error("Commented out code that no longer compiles due to OpenNLP API incompatible changes");
		FileReader datafr = new FileReader(inFile);
        EventStream es = new BasicEventStream(new PlainTextByLineDataStream(datafr));
		GISModel mod = GIS.trainModel(es, iters, cut);
		SuffixSensitiveGISModelWriter ssgmw = new
		mod, outFile);
		logger.info("Saving the model as: " + outFile.getAbsolutePath());

with this code:

Charset charset = Charset.forName("UTF-8");
FileInputStream inStream = new FileInputStream(inFile);
ObjectStream<String> lineStream = new PlainTextByLineStream(inStream, charset);
ObjectStream<SentenceSample> sampleStream = new SentenceSampleStream(lineStream);
SentenceModel mod;
try {
	mod = SentenceDetectorME.train("en", sampleStream, true, null, ModelUtil.createTrainingParameters(iters,
} finally {

SuffixSensitiveGISModelWriter ssgmw = new SuffixSensitiveGISModelWriter(
				 mod.getMaxentModel(), outFile);
logger.info("Saving the model as: " + outFile.getAbsolutePath());

Seems to be working but need to be checked. I have successfully generated models from the
examples and a new one in Spanish in which I am currently working.

This message was sent by Atlassian JIRA

View raw message