opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aliaksandr Autayeu <aliaksa...@autayeu.com>
Subject Re: Error in POS Tagger CrossValidator
Date Wed, 18 Jan 2012 09:23:06 GMT
Ah... OK.

Aliaksandr

On Wed, Jan 18, 2012 at 1:05 AM, James Kosin <james.kosin@gmail.com> wrote:

> Aliaksandr,
>
> I put the TODO there; because I couldn't determine if it was a better
> place.  The only big downside to using the Stream is we have no control
> over the encoding.  So, I was thinking more that this method of loading
> the item would be deprecated anyway.  In favor of the other method.
>
> James
>
> On 1/17/2012 5:50 AM, Aliaksandr Autayeu wrote:
> > Guys, if somebody knows that part of the code well, it would be nice to
> > take a look at:
> >
> > 1) TODO left there
> > 2) .reset() raising the above exception if the PlainTextByLineStream is
> > created with a stream.
> >
> > Aliaksandr
> >
> > On Tue, Jan 17, 2012 at 12:12 AM, william.colen@gmail.com <
> > william.colen@gmail.com> wrote:
> >
> >> Thank you, Aliaksandr!
> >>
> >>
> >>
> >> On Mon, Jan 16, 2012 at 6:13 PM, Aliaksandr Autayeu
> >> <aliaksandr@autayeu.com> wrote:
> >>> I have reproduced the problem. It boils down to different
> initialization
> >>> of PlainTextByLineStream. If it is instantiated by
> >>>
> >>>   public PlainTextByLineStream(Reader in) {
> >>>     this.in = new BufferedReader(in);
> >>>     this.channel = null;
> >>>     this.encoding = null;
> >>>   }
> >>>
> >>> it does not work. If it is instantiated with a channel:
> >>>
> >>>   public PlainTextByLineStream(FileChannel channel, String
> charsetName) {
> >>>     this.encoding = charsetName;
> >>>     this.channel = channel;
> >>>
> >>>     // TODO: Why isn't reset called here ?
> >>>     in = new BufferedReader(Channels.newReader(channel, encoding));
> >>>   }
> >>>
> >>> it does work, because later on in reset:
> >>>
> >>>     if (channel == null) {
> >>>         in.reset();
> >>>     }
> >>>     else {
> >>>       channel.position(0);
> >>>       in = new BufferedReader(Channels.newReader(channel, encoding));
> >>>     }
> >>>
> >>> reader is recreated instead of direct in.reset() call.
> >>>
> >>>
> >>> Now, these differences come into play because
> WordTagSampleStreamFactory
> >> has
> >>> different PlainTextByLineStream initialization, which is probably my
> >> fault
> >>> due to work on factories in 402. Looks like a copy-paste error.
> >>>
> >>> I have tried to commit a fix, but I'm getting 403 error :(  Please,
> apply
> >>> the attached patch.
> >>>
> >>> Aliaksandr
> >>>
> >>>
> >>> On Mon, Jan 16, 2012 at 12:54 AM, william.colen@gmail.com
> >>> <william.colen@gmail.com> wrote:
> >>>> Hi,
> >>>>
> >>>> I am having an error in POS Tagger CrossValidator tool from the trunk.
> >>>> I tried the same command with a released version and it worked, also
I
> >>>> tried Chunker CV tool and it is working too.
> >>>> I tried debugging the code and check the SVN history for some clue,
> >>>> but could not find anything. Any idea what is wrong?
> >>>>
> >>>> $ bin/opennlp POSTaggerCrossValidator -lang pt -encoding MacRoman
> >>>> -data pos1.txt -cutoff 50
> >>>>
> >>>> IO error while reading training data or indexing data: Stream not
> marked
> >>>>
> >>>> Stack trace:
> >>>> java.io.IOException: Stream not marked
> >>>>        at java.io.BufferedReader.reset(BufferedReader.java:485)
> >>>>        at
> >>>>
> >>
> opennlp.tools.util.PlainTextByLineStream.reset(PlainTextByLineStream.java:79)
> >>>>        at
> >>>>
> opennlp.tools.util.FilterObjectStream.reset(FilterObjectStream.java:43)
> >>>>        at
> >>>>
> >>
> opennlp.tools.util.eval.CrossValidationPartitioner.next(CrossValidationPartitioner.java:256)
> >>>>        at
> >>>>
> >>
> opennlp.tools.postag.POSTaggerCrossValidator.evaluate(POSTaggerCrossValidator.java:113)
> >>>>        at
> >>>>
> >>
> opennlp.tools.cmdline.postag.POSTaggerCrossValidatorTool.run(POSTaggerCrossValidatorTool.java:72)
> >>>>        at opennlp.tools.cmdline.CLI.main(CLI.java:212)
> >>>>
> >>>>
> >>>> Any idea what is wrong?
> >>>>
> >>>> Thanks,
> >>>> William
> >>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message