opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damiano Porta <damianopo...@gmail.com>
Subject Re: Surronding tokens of the entity on MaxEnt models
Date Mon, 02 May 2016 13:31:44 GMT
Hi Daniel! Thank you so much!

Unfortunately, I am not sure. I really do not know what is the best way in
this case.
I have a dataset with patterns like:

my name is {name}, from {location}
name: {name}
full name: {name}
I am {name}, i was born in {location}

etc etc etc

I could use regexes too. Maybe i list of patterns that i can loop for each
document. What do you think? I do not know if i can build a training set
with those example (i have around 100 different patterns).
How can i create those features with my patterns?

Thank you in advance!


2016-05-02 15:19 GMT+02:00 Russ, Daniel (NIH/CIT) [E] <druss@mail.nih.gov>:

> Hi Damiano,
>
>      Why are you so sure that your model with not work?  A couple of
> things to remember, 1. you need quite a bit of training data.  Two
> sentences does not make a training set.  2. You probably need more than a
> window of words as your features.  However, you can see that word-2=“name"
> and word-1=“is” tend to precede a name.  Look into other potential features
> and get a larger dataset and your results may surprise you.
>
> Daniel
>
>
> On May 1, 2016, at 3:13 PM, Jeffrey Zemerick <jzemerick@apache.org<mailto:
> jzemerick@apache.org>> wrote:
>
> I'm sure the others on this list can give you a more complete answer so I
> will try to not lead you astray.
>
> The WindowFeatureGenerator is only one of the available feature generators.
> There are many classes that implement the AdaptiveFeatureGenerator
> interface [1] and you can, of course, provide your own implementation of
> that interface to support additional features. For example, the
> SentenceFeatureGenerator [2] looks at the beginning and end of each
> training sentence. So to answer your question, the length of the training
> sentence should not matter - what matters is if the combination of
> configured feature generators used can provide a model that accurately
> describes the training text.
>
> Jeff
>
> [1]
>
> https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/AdaptiveFeatureGenerator.html
> [2]
>
> https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/SentenceFeatureGenerator.html
>
>
> On Sun, May 1, 2016 at 12:02 PM, Damiano Porta <damianoporta@gmail.com>
> wrote:
>
> Hi Jeff!
> Thank you so much for your fast reply.
>
> I have a doubt, let suppose we use this feature with a window of:
>
> 2 tokens on the left + *ENTITY* + 2 tokens on the right
>
> The doubt is how can i train the model correctly?
>
> if only the previous 2 tokens and the next 2 tokens matters i should not
> use long sentences to training the model. Right?
>
> For example (person-model.train):
>
> 1. I am <START:person> Barack <END> and I am the president of USA
>
> 2. My name is <START:person> Barack <END> and my surname is Obama
>
> ...
>
> Those are two stupid training samples, it is just to let you know my doubt.
>
> In this case i should have:
>
> *I am Barack and I*
>
> *name is Barack and my*
>
> the others tokens (left and right) do not matter. So the sentences on my
> training set should be very short, right? Basically I should only define
> all the "combinations" of the previous/next 2 tokens, right?
>
> Thank you!
> Damiano
>
>
>
> 2016-05-01 16:07 GMT+02:00 Jeffrey Zemerick <jzemerick@apache.org>:
>
> I think you are looking for the WindowFeatureGenerator [1]. You can set
> the
> size of the window by specifying the number of previous tokens and number
> of next tokens.
>
> Jeff
>
> [1]
>
>
>
> https://opennlp.apache.org/documentation/1.5.3/apidocs/opennlp-tools/opennlp/tools/util/featuregen/WindowFeatureGenerator.html
>
>
> On Sun, May 1, 2016 at 5:16 AM, Damiano Porta <damianoporta@gmail.com>
> wrote:
>
> Hello everybody
> How many surrounding tokens are kept into account to find the entity
> using
> a maxent model?
> Basically a maxent model should detect an entity looking at the
> surronding
> tokens, right ?
> I would like to understand if:
>
> 1. can i set the number of tokens on the left side?
> 2. can i set the number of tokens on the right side too ?
>
> Thank you in advance for the clarification
> Best
>
> Damiano
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message