opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Giaconia, Mark [USA]" <Giaconia_M...@bah.com>
Subject AggregatedEntityLinker
Date Tue, 11 Jun 2013 11:29:18 GMT
Joern, 

couple thoughts as I poke through this AggregatedEntityLinker...
It seems as if the AggregatedEntityLinker (AEL) only makes sense if the spans passed in contain
the types that the AEL works on (if a sentence contains only Location spans, then there is
no point instantiating linkers for locations AND orgs for example). 

Currently the BaseEntityLinker abstract class, and the framework in general, supports an array
of spans of random type (the Span[] can have location, person, orgs etc) like this:

    for (Span s : spans) {
      List<EntityLinker> linkers = EntityLinkerFactory.getLinkers(s.getType(), properties);
      for (EntityLinker linker : linkers) {
        outLinkedSpans.addAll(linker.find(tokens, spans, properties));
      }
    }

outLinkedSpans is the aggregate of all the types in the Span[] array, and this approach would
not instance unnecessary linkers

I will continue to implement, but just not sure the AEL is needed... what do you think?


________________________________
Mark Giaconia
Lead Associate
 Strategic Innovation Group
Booz | Allen | Hamilton
C 571 748 9673 (unavailable during the day)
On Site 703 995 3089


________________________________________
From: Jörn Kottmann [kottmann@gmail.com]
Sent: Monday, June 03, 2013 5:51 AM
To: dev@opennlp.apache.org
Subject: [External]  Re: DocumentNameFinder (LinkableDocumentNameFinder)

Hello,

as far as I understand the current proposal of the EntityLinker it
already operates on a per document
level. It gets an entire document passed to the find method.

In OpenNLP the user itself is responsible to do all the pre-processing,
because this also often varies in small details,
e.g. some users need access to the token and sentence segmentation, some
only to the tokenization,
some don't need anything, some have already the sentences, etc.
The code to write this logic is usually very simple and lets the user
integrate things as they fit into their application.

Jörn

On 06/02/2013 10:40 PM, Giaconia, Mark [USA] wrote:
> As part of working the EntityLinker (issue OPENNLP-579<https://issues.apache.org/jira/browse/OPENNLP-579>),
I created a new Interface and a default impl
> called LinkableDocumentNameFinder/DefaultLinkableDocumentNameFinderImpl.
> Here are the method signatures for the Interface
>
> public interface LinkableDocumentNameFinder{
>    Document find(String[] sentences, Tokenizer tokenizer, List<TokenNameFinder>
nameFinders, boolean linkable);
>    Document find(String documentText, SentenceDetector sentenceDetector, Tokenizer tokenizer,
List<TokenNameFinder> nameFinders, boolean linkable);
>    Document find(List<Sentence> sentences, Tokenizer tokenizer, List<TokenNameFinder>
nameFinders, boolean linkable);
>    Document find(Document document, SentenceDetector sentenceDetector, Tokenizer tokenizer,
List<TokenNameFinder> nameFinders, boolean linkable);
>    List<Document> find(List<Document> documents, SentenceDetector sentenceDetector,
Tokenizer tokenizer, List<TokenNameFinder> nameFinders, boolean linkable);
> }
>
> notice the Document object return type... here is what a Document object looks like
>
> public class Document{
>   private List<Sentence> sentences = new ArrayList<>();
>    public List<Sentence> getSentences()  {
>      return sentences;
>    }
>    public void setSentences(List<Sentence> sentences)  {
>      this.sentences = sentences;
>    }
> }
>
> notice the Sentence object..... here it is:
> public class Sentence{
>    private String sentenceText;
>    private Integer sentenceNumber;
>    private List<String> tokens = new ArrayList<>();
>    private List<Span> spans = new ArrayList<>();
>
>    public Sentence(String sentenceText, Integer sentenceNumber)  {
>      this.sentenceNumber = sentenceNumber;
>      this.sentenceText = sentenceText;
>    }
> //setters...getters....
> }
>
>
> Mark Giaconia
>
>


Mime
View raw message