ctakes-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roberto Costumero Moreno (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CTAKES-268) Fix SentenceDetector training with updated OpenNLP API
Date Mon, 25 Nov 2013 13:17:35 GMT

     [ https://issues.apache.org/jira/browse/CTAKES-268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Roberto Costumero Moreno updated CTAKES-268:
--------------------------------------------

    Attachment:     (was: SentenceDetector.patch)

> Fix SentenceDetector training with updated OpenNLP API
> ------------------------------------------------------
>
>                 Key: CTAKES-268
>                 URL: https://issues.apache.org/jira/browse/CTAKES-268
>             Project: cTAKES
>          Issue Type: Improvement
>          Components: ctakes-core
>    Affects Versions: 3.1, 3.2, 3.1.1
>         Environment: Mac OS X
>            Reporter: Roberto Costumero Moreno
>              Labels: patch
>             Fix For: 3.1, 3.2, 3.1.1
>
>         Attachments: SentenceDetector.java.patch, sample_sd_en_en.mod, sample_sd_training_sentences.txt
>
>
> Fixed the problem where SentenceDetector did not work as expected due to changes in the
OpenNLP API.
> I have attached a patch file and the sentence file I have used and the model that it
has generated.
> I have changed code around line 300:
> logger.error("----------------------------------------------------------------------------------");
> logger.error("Need to update yet for OpenNLP changes "); // TODO
> logger.error("Commented out code that no longer compiles due to OpenNLP API incompatible
changes"); // TODO
> logger.error("----------------------------------------------------------------------------------");
> 		
> 		FileReader datafr = new FileReader(inFile);
>         EventStream es = new BasicEventStream(new PlainTextByLineDataStream(datafr));
> 		
> 		GISModel mod = GIS.trainModel(es, iters, cut);
> 		SuffixSensitiveGISModelWriter ssgmw = new
> 		SuffixSensitiveGISModelWriter(
> 		mod, outFile);
> 		logger.info("Saving the model as: " + outFile.getAbsolutePath());
> 		ssgmw.persist();
> with this code:
> Charset charset = Charset.forName("UTF-8");
> 		
> FileInputStream inStream = new FileInputStream(inFile);
> ObjectStream<String> lineStream = new PlainTextByLineStream(inStream, charset);
> ObjectStream<SentenceSample> sampleStream = new SentenceSampleStream(lineStream);
> 		
> SentenceModel mod;
> 		
> try {
> 	mod = SentenceDetectorME.train("en", sampleStream, true, null, ModelUtil.createTrainingParameters(iters,
cut));
> } finally {
> 	sampleStream.close();
> 	inStream.close();
> }
> SuffixSensitiveGISModelWriter ssgmw = new SuffixSensitiveGISModelWriter(
> 				 mod.getMaxentModel(), outFile);
> logger.info("Saving the model as: " + outFile.getAbsolutePath());
> ssgmw.persist();
> Seems to be working but need to be checked. I have successfully generated models from
the examples and a new one in Spanish in which I am currently working.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message