opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aliaksandr Autayeu <aliaksa...@autayeu.com>
Subject Re: Word Sense Disambiguator
Date Wed, 09 Sep 2015 18:39:59 GMT
Cristian, the reference you gave basically uses synset offsets - 1740 is
entity, 1930 is physical entity, etc. However, in YAGO they seems to have
added 100000000 to those offsets.

Synset offset is the fastest way to get into WordNet dictionary, because it
is a direct file offset. Offset alone is not enough though, you also need
POS - part of speech. Speed probably is the reason most people access
WordNet this way. However, offset is not the best "key", especially for
indexing, because offsets change as WordNet evolves. SenseKeys (e.g.
bank%1:14:00::
and bank%1:21:01::) should be more suitable for indexing.

If you're looking to connect with YAGO above, you might do something along
the lines of
....getWordBySenseKey(sensekey).getSynset().getOffset and then add 100000000
to get the YAGO ids.

Aliaksandr


On 9 September 2015 at 09:51, Cristian Petroaca <cristian.petroaca@gmail.com
> wrote:

> I am looking for the Sense Id of the word. It has this format here :
>
> http://resources.mpi-inf.mpg.de/yago-naga/yago/download/yago/yagoWordnetIds.txt
>
>
> On Tue, Sep 8, 2015 at 6:47 PM, Anthony Beylerian <
> anthony.beylerian@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks it is still being improved.
> >
> > I am not sure what you mean by type or database ID.
> > Currently the sense source and the sense ID are returned.
> >
> > For example:
> >
> > "I went to the bank to deposit money."
> > target : bank (index : 4)
> > expected output : [WORDNET bank%1:14:00:: 21.6, WORDNET bank%1:21:01::
> > 5.8,... etc]
> >
> > Where "bank%1:14:00::" is a SenseKey which you can query WordNet with to
> > give you a sense definition.
> >
> > You can do this using the default dictionary :
> >
> >
> Dictionary.getDefaultResourceInstance().getWordBySenseKey(sensekey).getSynset().getGloss();
> >
> > Hope this is what you are looking for, otherwise please clarify.
> >
> > Anthony Beylerian
> >
> > On Tue, Sep 8, 2015 at 5:34 PM, Cristian Petroaca <
> > cristian.petroaca@gmail.com> wrote:
> >
> > > Hi Anthony,
> > >
> > > I had a chance to test the wsd component. That's great work. Thanks.
> > > One question, is it possible to return the wordnet type (or database
> id)
> > of
> > > the disambiguated word?
> > >
> > > Thanks,
> > > Cristian
> > >
> > > On Fri, Jul 24, 2015 at 1:14 PM, Anthony Beylerian <
> > > anthonybeylerian@hotmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > To try out the ongoing implementations, after checking out the
> sandbox
> > > > repository please try these steps :
> > > > 1- Create a resource models directory:
> > > >
> > > > - src
> > > >   - test
> > > >     - resources
> > > >       + models
> > > >
> > > > 2- Include the following pre-trained models and dictionary in that
> > > > directory:
> > > > You can find those here [1] if you like or pre-train your own models.
> > > >
> > > > {
> > > > en-token.bin,
> > > > en-pos-maxent.bin,
> > > > en-sent.bin,en-ner-person.bin,en-lemmatizer.dict
> > > > }
> > > >
> > > > As to train the IMS approach you need to include training data like
> > > > senseval3 [2]:
> > > > For now, please add these folders :
> > > > - src
> > > >   - test
> > > >     - resources
> > > >        - supervised
> > > >          + raw
> > > >          + models
> > > >          + dictionary
> > > >
> > > > You can find the data files here [2].
> > > >
> > > > 3- We included two examples [LeskTester.java] and [IMSTester.java]
> that
> > > > you can run directly, or make your own tests.
> > > >
> > > > To run a custom test, minimally you need to have a tokenized text or
> > > > sentence  for example for Lesk:
> > > >
> > > >           1>> String[] words =
> > Loader.getTokenizer().tokenize(sentence);
> > > >
> > > > Chose the index of the word to disambiguate in the token array.
> > > >
> > > >           2>> int wordIndex= 6;
> > > >
> > > > Then just create a WSDisambiguator object for example for Lesk :
> > > >
> > > >          3>> Lesk lesk = new Lesk();
> > > >
> > > > And you can call the default disambiguation method
> > > >
> > > >          4>> lesk.disambiguate(words,wordIndex);
> > > >
> > > > You will get an array of strings with the following format :
> > > >
> > > > Lesk : [Source SenseKey Score]
> > > >
> > > > To read the sense definitions you can use the method :
> > > > [opennlp.tools.disambiguator.Constants.printResults]
> > > >
> > > > For using the variations of Lesk, you will need to create and
> > configure a
> > > > parameters object:
> > > >           5>> LeskParameters leskParams = new LeskParameters();
> > > > 6>>
> > > >
> > leskParams.setLeskType(LeskParameters.LESK_TYPE.LESK_BASIC_CTXT_WIN_BF);
> > > >       7>> leskParams.setWin_b_size(4);          8>>
> > > > leskParams.setDepth(3);          9>> lesk.setParams(leskParams);
> > > >
> > > > Typically, IMS should perform better than Lesk, since Lesk is a
> classic
> > > > method but it usually used as a baseline along with the most frequent
> > > sense
> > > > (MFS).
> > > > However, we will be testing and adding more techniques.
> > > >
> > > > In any case, please feel free to ask for more details.
> > > >
> > > > Best,
> > > >
> > > > Anthony
> > > >
> > > > [1] :
> > > >
> > >
> >
> https://drive.google.com/folderview?id=0B67Iu3pf6WucfjdYNGhDc3hkTXd1a3FORnNUYzd3dV9YeWlyMFczeHU0SE1TcWwyU1lhZFU&usp=sharing
> > > > [2] :
> > > >
> > >
> >
> https://drive.google.com/file/d/0ByL0dmKXzHVfSXA3SVZiMnVfOGc/view?usp=sharing
> > > > > Date: Fri, 24 Jul 2015 09:54:02 +0200
> > > > > Subject: Re: Word Sense Disambiguator
> > > > > From: kottmann@gmail.com
> > > > > To: dev@opennlp.apache.org
> > > > >
> > > > > It would be nice if you could share instructions on how to run it.
> > > > > I also would like to give it a try.
> > > > >
> > > > > Jörn
> > > > >
> > > > > On Fri, Jul 24, 2015 at 4:54 AM, Anthony Beylerian <
> > > > > anthonybeylerian@hotmail.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > > Yes for the moment we are only using WordNet for sense
> > > definitions.The
> > > > > > plan is to complete the package by mid to late August, but if
you
> > > like
> > > > you
> > > > > > can follow up on the progress from the sandbox.
> > > > > > Best regards,
> > > > > > Anthony
> > > > > > > Date: Thu, 23 Jul 2015 15:36:57 +0300
> > > > > > > Subject: Word Sense Disambiguator
> > > > > > > From: cristian.petroaca@gmail.com
> > > > > > > To: dev@opennlp.apache.org
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I saw that there are people actively working on a Word
Sense
> > > > > > Disambiguator.
> > > > > > > DO you guys know when will the module be ready to use?
Also I
> > > assume
> > > > that
> > > > > > > wordnet is used to define the disambiguated word meaning?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Cristian
> > > > > >
> > > > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message