opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim - FooBar();" <>
Subject Re: Merging the output of multiple name finders
Date Tue, 17 Apr 2012 18:12:30 GMT
On 17/04/12 17:27, Jörn Kottmann wrote:
> On 04/17/2012 06:23 PM, Jim - FooBar(); wrote:
>> Yes i get your point, we've discussed this before and generally i do 
>> is just that i hadn't thought that people would use the 
>> AggregateNameFinder to produce names. The whole idea from day 1 was 
>> to improve the evaluation.
>> Should i go ahead and modify the AggreagateNameFinder to sort  all 
>> the predictions spans according to a comparator that looks at the 
>> start offsets (in increasing order) and checks for overlaps?
> +1, and we should get a better name for it.
> Jörn

Ok so i had i go at what was discussed and here is how it looks like 
(allFindings comes in sorted according to ascending order of start offsets):
   private Span[] untangle(List<Span> allFindings){
     List<Span> problems = new ArrayList<Span>();//all the ovelaps

     for (int i=1;i<allFindings.size();i++){//start from 1
       Span current = allFindings.get(i);
       Span previous = allFindings.get(i-1);//safe

       if (current.intersects(previous) || current.crosses(previous)){
         if (current.getType().equals(previous.getType())){//if same type
           Span temp = ((current.length()-previous.length()) > 0) ? 
current : previous;
           allFindings.set(i, temp);  //keep the longest one in findings
         else {   //add both as problems

     if(problems.isEmpty()) //if no problems do the usual
       return allFindings.toArray(new Span[allFindings.size()]);
       return sortProblems(allFindings, problems); //don't know what to 
do in this method

as you can see i'm stuck at the very last do we sort 
overlapping spans with different type? on what basis? at this point i've 
lost information like "what finder did this prediction came from?" and 
thus cannot make any reasoning...I do keep a Map with the finder class 
as key and a list of its predictions as value but his was intended only 
for debugging. I cannot rely on that in order to reason about what 
should stay and what should go in the final span array. If they are the 
same type we keep the longest but what about different types? who do we 

any pointers?


View raw message