opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark G <giaconiam...@gmail.com>
Subject Re: Triplet Extraction with OpenNLP
Date Fri, 27 Sep 2013 10:46:30 GMT
internally to the Parse class, I think, perhaps,  the showCodeTree() method
is doing similar to what you might want (as a start), it is a recursive
method for traversing through the children of the top parse object. If you
have the source code look at the Parse object, and the showCodeTree method.
I was thinking you could construct a sorted map (TreeMap) with part of
speech or chunk as a key sorted by the order it was mentioned, and then a
treeset of parts as the value to each key so you would be able to get the
first or last from the value/set depending on the position and type of the
key. Just a rough thought though
Mark G


On Fri, Sep 27, 2013 at 3:09 AM, Carlos Scheidecker <nando.nlp@gmail.com>wrote:

> This is awesome Mark, thanks!
>
> This will be quite useful for everybody else as well.
>
> I ended up doing mine and I went further with the other part of extraction.
>
> What I found interesting is the time it takes to load the
> model en-parser-chunking.bin which is about 36mb.
>
> So I am not loading everytime but just during object creation.
>
> Anyone has another better suggestion?
>
> cheers.
>
>
> On Thu, Sep 26, 2013 at 4:59 PM, Mark G <giaconiamark@gmail.com> wrote:
>
> > Carlos.. I threw this together to show how to get a Parser running.
> > Look at what this prints, I think you may be able to iterate through
> > topParses[] and traverse the tree. If there is a more efficient way I am
> > sure the other OpenNLPers will chime in.
> >
> >
> >   public static void main(String[] args) throws InvalidFormatException,
> > IOException {
> >
> >     InputStream is = new
> > FileInputStream("c:\\temp\\opennlpmodels\\en-parser-chunking.bin");
> >
> >     ParserModel model = new ParserModel(is);
> >     is.close();
> >     Parser parser = ParserFactory.create(model);
> >
> >     String sentence = "The countries broke off peace talks following the
> > Mumbai attacks but have begun discussions again, focusing on increasing
> > trade.";
> >     Parse topParses[] = ParserTool.parseLine(sentence, parser, 1);
> >
> >     Parse p = topParses[0];
> >     p.showCodeTree();
> >     p.show();
> >     p.getParent();
> >     p.getChildren();
> >
> >
> >     System.out.println(p.getText());
> >   }
> >
> > It should print all this...
> >
> > [0] S 2092924121 -> 2092924121 TOP The countries broke off peace talks
> > following the Mumbai attacks but have begun discussions again, focusing
> on
> > increasing trade.
> > [0.0] NP 2092766686 -> 2092924121 S The countries
> > [0.0.0] DT 2092752996 -> 2092766686 NP The
> > [0.0.0.0] TK 2092752996 -> 2092752996 DT The
> > [0.0.1] NNS 2092969298 -> 2092766686 NP countries
> > [0.0.1.0] TK 2092969298 -> 2092969298 NNS countries
> > [0.1] VP 2093633263 -> 2092924121 S broke off peace talks following the
> > Mumbai attacks but have begun discussions again, focusing on increasing
> > trade.
> > [0.1.0] VP 2093545647 -> 2093633263 VP broke off peace talks following
> the
> > Mumbai attacks
> > [0.1.0.0] VBD 2093484042 -> 2093545647 VP broke
> > [0.1.0.0.0] TK 2093484042 -> 2093484042 VBD broke
> > [0.1.0.1] PRT 2093793436 -> 2093545647 VP off
> > [0.1.0.1.0] RP 2093793436 -> 2093793436 PRT off
> > [0.1.0.1.0.0] TK 2093793436 -> 2093793436 RP off
> > [0.1.0.2] NP 2094012476 -> 2093545647 VP peace talks
> > [0.1.0.2.0] NN 2094004262 -> 2094012476 NP peace
> > [0.1.0.2.0.0] TK 2094004262 -> 2094004262 NN peace
> > [0.1.0.2.1] NNS 2094316394 -> 2094012476 NP talks
> > [0.1.0.2.1.0] TK 2094316394 -> 2094316394 NNS talks
> > [0.1.0.3] PP 2094660013 -> 2093545647 VP following the Mumbai attacks
> > [0.1.0.3.0] VBG 2094634002 -> 2094660013 PP following
> > [0.1.0.3.0.0] TK 2094634002 -> 2094634002 VBG following
> > [0.1.0.3.1] NP 2095166543 -> 2094660013 PP the Mumbai attacks
> > [0.1.0.3.1.0] DT 2095146008 -> 2095166543 NP the
> > [0.1.0.3.1.0.0] TK 2095146008 -> 2095146008 DT the
> > [0.1.0.3.1.1] NNP 2095358203 -> 2095166543 NP Mumbai
> > [0.1.0.3.1.1.0] TK 2095358203 -> 2095358203 NNP Mumbai
> > [0.1.0.3.1.2] NNS 2095723726 -> 2095166543 NP attacks
> > [0.1.0.3.1.2.0] TK 2095723726 -> 2095723726 NNS attacks
> > [0.1.1] CC 2096134426 -> 2093633263 VP but
> > [0.1.1.0] TK 2096134426 -> 2096134426 CC but
> > [0.1.2] VP 2096419178 -> 2093633263 VP have begun discussions again,
> > focusing on increasing trade.
> > [0.1.2.0] VBP 2096343883 -> 2096419178 VP have
> > [0.1.2.0.0] TK 2096343883 -> 2096343883 VBP have
> > [0.1.2.1] VP 2096672443 -> 2096419178 VP begun discussions again,
> focusing
> > on increasing trade.
> > [0.1.2.1.0] VBN 2096605362 -> 2096672443 VP begun
> > [0.1.2.1.0.0] TK 2096605362 -> 2096605362 VBN begun
> > [0.1.2.1.1] NP 2096925708 -> 2096672443 VP discussions
> > [0.1.2.1.1.0] NNS 2096925708 -> 2096925708 NP discussions
> > [0.1.2.1.1.0.0] TK 2096925708 -> 2096925708 NNS discussions
> > [0.1.2.1.2] PP 2097584197 -> 2096672443 VP again, focusing on increasing
> > trade.
> > [0.1.2.1.2.0] IN 2097543127 -> 2097584197 PP again,
> > [0.1.2.1.2.0.0] TK 2097543127 -> 2097543127 IN again,
> > [0.1.2.1.2.1] S 2097938768 -> 2097584197 PP focusing on increasing trade.
> > [0.1.2.1.2.1.0] VP 2097938768 -> 2097938768 S focusing on increasing
> trade.
> > [0.1.2.1.2.1.0.0] VBG 2097910019 -> 2097938768 VP focusing
> > [0.1.2.1.2.1.0.0.0] TK 2097910019 -> 2097910019 VBG focusing
> > [0.1.2.1.2.1.0.1] PP 2098394645 -> 2097938768 VP on increasing trade.
> > [0.1.2.1.2.1.0.1.0] IN 2098370003 -> 2098394645 PP on
> > [0.1.2.1.2.1.0.1.0.0] TK 2098370003 -> 2098370003 IN on
> > [0.1.2.1.2.1.0.1.1] NP 2098546604 -> 2098394645 PP increasing trade.
> > [0.1.2.1.2.1.0.1.1.0] VBG 2098537021 -> 2098546604 NP increasing
> > [0.1.2.1.2.1.0.1.1.0.0] TK 2098537021 -> 2098537021 VBG increasing
> > [0.1.2.1.2.1.0.1.1.1] NN 2099103787 -> 2098546604 NP trade.
> > [0.1.2.1.2.1.0.1.1.1.0] TK 2099103787 -> 2099103787 NN trade.
> > (TOP (S (NP (DT The) (NNS countries)) (VP (VP (VBD broke) (PRT (RP off))
> > (NP (NN peace) (NNS talks)) (PP (VBG following) (NP (DT the) (NNP Mumbai)
> > (NNS attacks)))) (CC but) (VP (VBP have) (VP (VBN begun) (NP (NNS
> > discussions)) (PP (IN again,) (S (VP (VBG focusing) (PP (IN on) (NP (VBG
> > increasing) (NN trade.)))))))))))
> > The countries broke off peace talks following the Mumbai attacks but have
> > begun discussions again, focusing on increasing trade
> >
> > let me know how it works
> >
> > happy coding!
> >
> > Mark G
> >
> >
> >
> > On Thu, Sep 26, 2013 at 4:14 PM, Carlos Scheidecker <nando.nlp@gmail.com
> > >wrote:
> >
> > > Thanks Svetoslav,
> > >
> > > Would you have an example on that?
> > >
> > > cheers,
> > >
> > > Carlos.
> > >
> > >
> > > On Thu, Sep 26, 2013 at 5:09 AM, Svetoslav Marinov <
> > > svetoslav.marinov@findwise.com> wrote:
> > >
> > > > Hi Carlos,
> > > >
> > > > This is not exactly answer to your question but I am not really
> > convinced
> > > > that a Phrase structure tree is the best way to extract triplets. A
> > > > dependency graph is a much better option.
> > > >
> > > > There would be a number of NPs and PPs that are neither the subject
> nor
> > > > the object, and not sure at all whether an adjective can be an
> object.
> > > >
> > > > However, if you want to use OpenNLP and the parse tree, maybe you can
> > > > consider mapping the tree to FrameNet, thus you will see what kind of
> > > > arguments a verb will have and which of these can potentially be the
> > > > subject and the object.
> > > >
> > > > Best,
> > > >
> > > > Svetoslav
> > > > ________________________________________
> > > > Från: Carlos Scheidecker <nando.nlp@gmail.com>
> > > > Skickat: den 26 september 2013 11:37
> > > > Till: dev@opennlp.apache.org
> > > > Ämne: Triplet Extraction with OpenNLP
> > > >
> > > > Hello all,
> > > >
> > > > I am interested in performing Triplet Extraction.
> > > >
> > > > For that, I need to traverse the parse tree.
> > > >
> > > > I know how to use the ChunkMe, however I am not sure how to use the
> > > Parser
> > > > so that I can create a tree to traverse it.
> > > >
> > > > Ideally, I want to obtain the subject, predicate and object.
> > > >
> > > > To find the subject I need to search in the NP subtree selecting the
> > > first
> > > > descendent of NP that is a Noun via breadth first search.
> > > >
> > > > To find the predicate I will search the VP subtree, the deepest verb
> > > > descendent on that tree will give the predicate.
> > > >
> > > > Now for the object(s) they can be in 3 different subtrees. PP, NP and
> > > ADJ.
> > > > In NP and PP they will be the first noun while on the ADJ we need to
> > > locate
> > > > the first adjective.
> > > >
> > > > Therefore, what I need to learn is how to create the parser and the
> > main
> > > > tree so that I can navigate the subtrees.
> > > >
> > > > Thanks for the help,
> > > >
> > > > Carlos.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message