opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "william.colen@gmail.com" <william.co...@gmail.com>
Subject Re: Merging the output of multiple name finders
Date Tue, 17 Apr 2012 13:22:37 GMT
Hi,

+1 for the baseline first.
I was also thinking of setting a priority for each type. For example
setting a higher priority to types which have a higher F1. But I like the
probabilities from model suggested by Jörn too.

William

On Tue, Apr 17, 2012 at 10:03 AM, Jörn Kottmann <kottmann@gmail.com> wrote:

> I propose that we make a simple baseline implementations
> which takes all output spans, orders them and then resolves
> the ambiguities based on the order. This will prefer longer
> names over shorter names, but ignores the type.
>
> There are more sophisticated ways of handling this,
> e.g taking probabilities from the statistical name finders into
> account, but these might be a bit more restrictive as well.
>
> Its always good to have some simple baseline, to see how much
> something more complicated improves it.
>
> Any opinions?
>
> Jörn
>
>
> On 04/17/2012 02:52 PM, Jörn Kottmann wrote:
>
>> If you don't want to handle these cases, you can simply copy all names
>> together
>> into a list, and then do evaluation on this list.
>> This approach works with our evaluation, but will usually be an issue for
>> applications which expect output
>> where the ambiguities mentioned earlier are resolved.
>>
>> Jörn
>>
>> On 04/17/2012 02:38 PM, Jim - FooBar(); wrote:
>>
>>> Ok first of all you're referring to the final merging
>>> (AggregateNameFinder) and not the multiple dictionaries where no merging
>>> occurs...anyway let's deal with this at the moment. let's see...
>>>
>>>> - Two names can be identical and have the same type or a different type
>>>>
>>> Well  if the type is different the spans are not identical (equal) so
>>> you keep both and do some reasoning over them (see below).
>>> If they type is the same and the spans cover the same text then they are
>>> equal so you only keep one of them.
>>>
>>>> - Two names have intersecting spans
>>>>
>>> It is very unlikely that both are correct so in the simplest case of
>>> keeping them both you may lose some precision. However considering how
>>> often that could happen it becomes unimportant. Or you could do some
>>> reasoning (see below) again if they have the same type. If they don't have
>>> the same type then why not keep them both again?
>>>
>>>  - One name is contained in another like this:
>>>> <START:A>  a b<START:B>  c<END:B>  d<END:A>
>>>>
>>> well, this is exactly the same case as before conceptually. If they have
>>> the same type it's very likely that one is wrong.You can do the same sort
>>> of reasoning as above. If they don't there is no way to know with
>>> confidence what to do so i say keep them both.
>>>
>>> the reasoning i'm referring to is simply to *trust the dictionary* (if
>>> one exists). If one doesn't exist and one is trying to merge results from
>>> several maxent models for example, then we cannot make an informed
>>> decision. It is only the dictionary that can provide facts. all the rest
>>> are probabilities...
>>>
>>> Jim
>>>
>>>
>>>  Hi all,
>>>>
>>>> in one of the jiras we started a discussion about merging the output
>>>> of multiple name finders and which conflicts exist.
>>>> Lets move it back to the dev list.
>>>>
>>>> The merging code needs to handle these cases:
>>>>
>>>> - Two names can be identical and have the same type or a different type.
>>>>
>>>> - Two names have intersecting spans like this:
>>>> <START:A>  a b<START:B>  c<END:A>  d<END:B>
>>>>
>>>> - One name is contained in another like this:
>>>> <START:A>  a b<START:B>  c<END:B>  d<END:A>
>>>>
>>>> Depending on the use case and merging logic it might be resolved
>>>> differently.
>>>>
>>>> Jörn
>>>>
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message