opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann <kottm...@gmail.com>
Subject Re: Size of training data
Date Mon, 29 Apr 2013 12:53:34 GMT
On 04/29/2013 02:32 PM, Svetoslav Marinov wrote:
> Ok, I hope I do this correctly: The counter for sample object I take from
> sampleStream: ObjectStream<NameSample> sampleStream = new
> NameSampleDataStream(lineStream);
>
> I use sampleStream.read() and the get 468 samples less than the number of
> sentences (which are 2 611 247). Shouldn't sampleStream match the number
> of sentences? I have samples without entities, but I suspect they are more
> than 468. Will check though.
>
> Otherwise I am not sure where to measure how many are processed per
> second. Do you mean during the creation of the NEmodel? Or? How does one
> do that?

You could implement a proxy ObjectStream object which can be inserted 
into the stream,
the call to the read method can then be used to do the counting and 
maybe printing out
the progress every n calls.

The difference could come from empty lines in your training data, only 
non-empty lines are becoming sample objects.

Jörn


Mime
View raw message