ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roberto Costumero Moreno (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CTAKES-268) Fix SentenceDetector training with updated OpenNLP API
Date Wed, 20 Nov 2013 20:24:36 GMT
Roberto Costumero Moreno created CTAKES-268:
-----------------------------------------------

             Summary: Fix SentenceDetector training with updated OpenNLP API
                 Key: CTAKES-268
                 URL: https://issues.apache.org/jira/browse/CTAKES-268
             Project: cTAKES
          Issue Type: Improvement
          Components: ctakes-core
    Affects Versions: 3.1, 3.2, 3.1.1
         Environment: Mac OS X
            Reporter: Roberto Costumero Moreno
             Fix For: 3.1, 3.2, 3.1.1


Fixed the problem where SentenceDetector did not work as expected due to changes in the OpenNLP
API.

I have changed code around line 300:

logger.error("----------------------------------------------------------------------------------");
logger.error("Need to update yet for OpenNLP changes "); // TODO
logger.error("Commented out code that no longer compiles due to OpenNLP API incompatible changes");
// TODO
logger.error("----------------------------------------------------------------------------------");
		
		FileReader datafr = new FileReader(inFile);
        EventStream es = new BasicEventStream(new PlainTextByLineDataStream(datafr));
		
		GISModel mod = GIS.trainModel(es, iters, cut);
		SuffixSensitiveGISModelWriter ssgmw = new
		SuffixSensitiveGISModelWriter(
		mod, outFile);
		logger.info("Saving the model as: " + outFile.getAbsolutePath());
		ssgmw.persist();


with this code:


Charset charset = Charset.forName("UTF-8");
		
FileInputStream inStream = new FileInputStream(inFile);
ObjectStream<String> lineStream = new PlainTextByLineStream(inStream, charset);
ObjectStream<SentenceSample> sampleStream = new SentenceSampleStream(lineStream);
		
SentenceModel mod;
		
try {
	mod = SentenceDetectorME.train("en", sampleStream, true, null, ModelUtil.createTrainingParameters(iters,
cut));
} finally {
	sampleStream.close();
	inStream.close();
}

SuffixSensitiveGISModelWriter ssgmw = new SuffixSensitiveGISModelWriter(
				 mod.getMaxentModel(), outFile);
logger.info("Saving the model as: " + outFile.getAbsolutePath());
ssgmw.persist();


Seems to be working but need to be checked. I have successfully generated models from the
examples and a new one in Spanish in which I am currently working.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message