opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damiano Porta <damianopo...@gmail.com>
Subject Re: Surronding tokens of the entity on MaxEnt models
Date Sun, 01 May 2016 16:02:34 GMT
Hi Jeff!
Thank you so much for your fast reply.

I have a doubt, let suppose we use this feature with a window of:

2 tokens on the left + *ENTITY* + 2 tokens on the right

The doubt is how can i train the model correctly?

if only the previous 2 tokens and the next 2 tokens matters i should not
use long sentences to training the model. Right?

For example (person-model.train):

1. I am <START:person> Barack <END> and I am the president of USA

2. My name is <START:person> Barack <END> and my surname is Obama

...

Those are two stupid training samples, it is just to let you know my doubt.

In this case i should have:

*I am Barack and I*

*name is Barack and my*

the others tokens (left and right) do not matter. So the sentences on my
training set should be very short, right? Basically I should only define
all the "combinations" of the previous/next 2 tokens, right?

Thank you!
Damiano



2016-05-01 16:07 GMT+02:00 Jeffrey Zemerick <jzemerick@apache.org>:

> I think you are looking for the WindowFeatureGenerator [1]. You can set the
> size of the window by specifying the number of previous tokens and number
> of next tokens.
>
> Jeff
>
> [1]
>
> https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html
>
>
> On Sun, May 1, 2016 at 5:16 AM, Damiano Porta <damianoporta@gmail.com>
> wrote:
> >
> > Hello everybody
> > How many surrounding tokens are kept into account to find the entity
> using
> > a maxent model?
> > Basically a maxent model should detect an entity looking at the
> surronding
> > tokens, right ?
> > I would like to understand if:
> >
> > 1. can i set the number of tokens on the left side?
> > 2. can i set the number of tokens on the right side too ?
> >
> > Thank you in advance for the clarification
> > Best
> >
> > Damiano
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message