uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: can't remove duplicate Annotations with Java Set Collection
Date Thu, 20 Nov 2014 20:18:27 GMT
Sorry, the pictures/images don't come through this email list...  If you want to
include them, please post them on a well-know clip-site, and include a link to
them in your email.

I think the issue you're having is that you wrote:

...
_@Override_
__*_public_*_ _*_int_*_ compare(Annotation __o1__, Annotation __o2__) {_
__...

The @Override indicates an error if the method signature you're defining can't
be matched to a method in the supertype.

The supertype here is "Comparator" and it only has a signature for compare with
2 args which are both "Object"s.

You can remove the @Override to get rid of this check.

-Marshall

On 11/18/2014 2:06 PM, Kameron Cole wrote:
>
> Awesome.  Your change will work.  And i will try it, thank you!
>
> But maybe you can help me to get this to work?   As I posted, if I use Object
> as the parameter in the compare method signature, Eclipse is ok; but when I
> change it to Annotation, it says I must override the methods - as though
> something about Annotator confuses Eclipse.  Here's the code I really want to
> work:
>
>
> -----------------------------------
>
> *public* *static* ArrayList<Annotation>  dedupe (AnnotationIndex<Annotation>
> idx2){
>
> ArrayList<Annotation> tempList = *new* ArrayList<Annotation>(idx2.size());
> FSIterator<Annotation> it2  = idx2.iterator();
> *while*(it2.hasNext())
> {
>
> tempList.add((Annotation) it2.next());
>
> }
>
> _Set_ set = *_new_*_ TreeSet(_*_new_*_ Comparator() {_
> ___@Override_
> __*_public_*_ _*_int_*_ compare(Annotation __o1__, Annotation __o2__) {_
> __*_if_*_(__o1__.getCoveredText()==__o2__.getCoveredText()){_
> _        _*_return_*_ 0;_
> _        }_
> _        _*_return_*_ 1;_
> _}_
> _})_;
>
> _set__.addAll(__tempList__)_;
>
> tempList.clear();
> tempList.addAll(_set_);
> System.*/out/*.println("templist length: "+tempList.size());
> *return* tempList;
>
> -----------------------------
>
> But look:at what Eclipse gives me:
>
>
>
>
>
>
>
>     --------------------------------------------------------------------------------
>
>     *Kameron Arthur Cole
>     Watson Content Analytics Applications and Support
>     email: **kameroncole@us.ibm.com* <mailto:kameroncole@us.ibm.com>* | Tel:
>     305-389-8512**
>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
>
> 	
>
> 	
>
>     <http://www.facebook.com/ibmwatson><https://twitter.com/@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
>
>
>     --------------------------------------------------------------------------------
>
>
>
> Inactive hide details for Marshall Schor ---11/18/2014 11:54:50 AM---An even
> simpler approach: Use a HashMap, where the key is Marshall Schor ---11/18/2014
> 11:54:50 AM---An even simpler approach: Use a HashMap, where the key is the
> annotation.getCoveredText() and the va
>
> From: Marshall Schor <msa@schor.com>
> To: user@uima.apache.org
> Date: 11/18/2014 11:54 AM
> Subject: Re: can't remove duplicate Annotations with Java Set Collection
>
> --------------------------------------------------------------------------------
>
>
>
> An even simpler approach:
>
> Use a HashMap, where the key is the annotation.getCoveredText() and the value is
> the annotation, instead of a HashSet.
>
> replace this (in your original):
>
> // push tempList into HashSet
> HashSet<Annotation> hs = new HashSet<Annotation>();
> hs.addAll(tempList);
>
>
> with
>
> // push tempList into HashMap
> HashMap<String, Annotation> hm = new HashSet<String, Annotation>();
> for (Annotation a : tempList) {
>  hm.put(a.getCoveredText(), a);
> }
>
> -Marshall
>
> On 11/18/2014 9:45 AM, Marshall Schor wrote:
> > Eclipse pointed out a bug in my code, fix is below
> > On 11/18/2014 9:37 AM, Marshall Schor wrote:
> >> Hi Kameron,
> >>
> >> Based on this code snip, the two "cat" annotations you create are "different"
> >> using the HashSet definition, because they correspond to two distinct UIMA
> >> Annotations.  You could, for instance, update one of them, and not the other;
> >> that it the sense in which they are distinct.  In the case below, the two "cat"
> >> annotations would have different begin and end offsets.
> >>
> >> I'm guessing that your goal was to to have one of the two cat annotations be
> >> dropped.
> >>
> >> You could do that by using your hash set approach, if you defined equal to mean
> >> that just the covered text of the annotation was equal.
> >>
> >> Here's one way to do this:  Create a "cover object" for your annotations, that
> >> contains a reference to the annotation and defines equals and hashcode (you
> have
> >> to define these together).  The easy way to do this is using Eclipse - define
a
> >> new class: e.g.
> >>
> >> public class MyAnnotationWithSpecialEquals {
> >>   final public Annotation annotation;   // the covered annotation
> >>  
> >>   public MyAnnotationWithSpecialEquals(Annotation annotation) {
> >>     this.annotation = annotation;
> >>   }
> >> }
> >>
> >> and then use Eclipse to define the equals and hashcode:  go to Menu ->
> Source ->
> >> Generate hashcode() and equals()
> >> and have it generate one based on just "annotation".  This will not (yet) be
> >> correct - it should add two methods like this:
> >>
> >>   @Override
> >>   public int hashCode() {
> >>     final int prime = 31;
> >>     int result = 1;
> >>     result = prime * result + ((annotation == null) ? 0 :
> annotation.hashCode());
> >>     return result;
> >>   }
> >>
> >>   @Override
> >>   public boolean equals(Object obj) {
> >>     if (this == obj)
> >>       return true;
> >>     if (obj == null)
> >>       return false;
> >>     if (getClass() != obj.getClass())
> >>       return false;
> >>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> >             // buggy lines
> >>     if (annotation == null) {
> >>       if (other.annotation != null)
> >>         return false;
> >             //  replace above with
> >       if (annotation == null && other.annotation != null)
> >         return false;
> >>     } else if (!annotation.equals(other.annotation))
> >>       return false;
> >>     return true;
> >>   }
> >>
> >> Now, to get these to be the definitions you want, which depend only on the
> >> covered text, modify these as follows:
> >>
> >> First, for hashCode, use only the string covered text:
> >>
> >>   @Override
> >>   public int hashCode() {
> >>     final int prime = 31;
> >>     int result = 1;
> >>     result = prime * result + ((annotation == null) ? 0 :
> >> annotation.getCoveredText().hashCode());
> >>     return result;
> >>   }
> >>
> >> and for equals: replace test for annotation being "equal" with
> >> annotation.getCoveredText() being "equal",
> >> with some additional edge case testing in case of nulls:
> >>
> >> @Override
> >>   public boolean equals(Object obj) {
> >>     if (this == obj)
> >>       return true;
> >>     if (obj == null)
> >>       return false;
> >>     if (getClass() != obj.getClass())
> >>       return false;
> >>     MyAnnotationWithSpecialEquals other = (MyAnnotationWithSpecialEquals) obj;
> >>     if (annotation == null) {
> >>       if (other.annotation != null)
> >>         return false;
> >>     } else {
> >>       String coveredText = annotation.getCoveredText();
> >>       if (coveredText == null) {
> >>          if (other.annotation.getCoveredText() == null)
> >>             return true;  // handle special case if covered text is null
> >>          else return false;
> >>       }
> >>       // coveredText is not null
> >>       if (!coveredText.equals(other.annotation.getCoveredText()))
> >>         return false;
> >>       return true;
> >>     }
> >>   }
> >>
> >> HTH.  -Marshall
> >>
> >>
> >> On 11/17/2014 4:49 PM, Kameron Cole wrote:
> >>> Input text:
> >>>
> >>> ------------------------------
> >>>
> >>> bird, cat, bush, cat
> >>>
> >>> ----------------------------
> >>>
> >>> Create the Annotations:
> >>>
> >>> -------------------------------
> >>> docText = aJCas.getDocumentText();
> >>>
> >>> *int* index = docText.indexOf("cat");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+3;
> >>> Animal animal = *new* Animal(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>  
> >>>    index = docText.indexOf("cat", index+1);
> >>> }
> >>>
> >>> index = docText.indexOf("bird");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+4;
> >>> Animal animal = *new* Animal(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>  
> >>>    index = docText.indexOf("bird", index+1);
> >>> }
> >>>
> >>> index = docText.indexOf("bush");
> >>> *while*(index >= 0) {
> >>> *int* begin = index;
> >>> *int* end = begin+4;
> >>> Vegetable animal = *new* Vegetable(aJCas);
> >>> animal.setBegin(begin);
> >>> animal.setEnd(end);
> >>> animal.addToIndexes();
> >>>  
> >>>    index = docText.indexOf("bird", index+1);
> >>> }
> >>> ------------------------------------------------------
> >>>
> >>>    
> --------------------------------------------------------------------------------
> >>>
> >>>     *Kameron Arthur Cole
> >>>     Watson Content Analytics Applications and Support
> >>>     email: **kameroncole@us.ibm.com* <mailto:kameroncole@us.ibm.com>*
| Tel:
> >>>     305-389-8512**
> >>>     **upload logs here* <http://www.ecurep.ibm.com/app/upload>  
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>    
> <http://www.facebook.com/ibmwatson><https://twitter.com/@ibmwatson><http://www.youtube.com/user/IBMWatsonSolutions/videos>
> >>>
> >>>
> >>>    
> --------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> Inactive hide details for Marshall Schor ---11/17/2014 04:35:06 PM---Hi,
Two
> >>> Feature Structures are considered "equal" in the sMarshall Schor ---11/17/2014
> >>> 04:35:06 PM---Hi, Two Feature Structures are considered "equal" in the sense
> >>> used by HashSet, if
> >>>
> >>> From: Marshall Schor <msa@schor.com>
> >>> To: user@uima.apache.org
> >>> Date: 11/17/2014 04:35 PM
> >>> Subject: Re: can't remove duplicate Annotations with Java Set Collection
> >>>
> >>>
> --------------------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>> Hi,
> >>>
> >>> Two Feature Structures are considered "equal" in the sense used by HashSet,
if
> >>> fs1.equals(fs2).   The definition of "equals" for feature structures is:
they
> >>> are equal if they refer to the same underlying CAS, and the same "spot"
in the
> >>> the CAS Heap.
> >>>
> >>> How did you create the Annotations that you think are "equal" in the HashSet
> >>> sense?
> >>>
> >>> Here's an example of two annotations which are "equal" in the UIMA sorted
> index
> >>> sense, but unequal in the HashSet sense.
> >>>
> >>>    Annotation fs1 = new Annotation(myJCas, 0, 4); // create an instance
of
> >>> Annotation in myJCas, with a begin = 0, and end = 4.
> >>>    Annotation fs2 = new Annotation(myJCas, 0, 4); // create an instance
of
> >>> Annotation in myJCas, with a begin = 0, and end = 4.
> >>>
> >>> These will be "equal" in the UIMA sense - the same kind of annotation, in
the
> >>> same CAS, with the same feature values, but will be two distinct feature
> >>> structures, so HashSet will consider them to be unequal.
> >>>
> >>> Could this be what is happening in your case?  Please respond so we can
see if
> >>> there's another straight-forward solution that does what you're looking
for.
> >>>
> >>> -Marshall
> >>> on 11/17/2014 2:59 PM, Kameron Cole wrote:
> >>>> Hello,
> >>>>
> >>>> I am trying to get rid of duplicates in the FSIndex.  I thought a very
> >>>> clever way to do this would be to just push them into a Set Collection
in
> >>>> Java, which does not allow duplicates. This is very (very) standard
Java:
> >>>>
> >>>> ArrayList al = new ArrayList();
> >>>> // add elements to al, including duplicates
> >>>> HashSet hs = new HashSet();
> >>>> hs.addAll(al);
> >>>> al.clear();
> >>>> al.addAll(hs);
> >>>>
> >>>> This list will contain no duplicates.
> >>>>
> >>>> However, I am not getting this to work in my UIMA code:
> >>>>
> >>>>
> >>>> System.out.println("Index size is: "+idx.size());
> >>>>
> >>>> AnnotationIndex<Annotation> idx = aJCas.getAnnotationIndex();
> >>>>
> >>>> ArrayList<Annotation> tempList = new ArrayList<Annotation>(idx.size());
> >>>>
> >>>> FSIterator it  = idx.iterator();
> >>>>
> >>>> //load the Annotations into a temporary list.  includes duplicates
> >>>>
> >>>> while(it.hasNext())
> >>>> {
> >>>>
> >>>> tempList.add((Annotation) it.next());
> >>>>
> >>>> }
> >>>>
> >>>> Iterator tempIt = tempList.iterator();
> >>>>
> >>>> // remove all Annotations from the index.  this works fine
> >>>>
> >>>> while(tempIt.hasNext()){
> >>>> ((Annotation) tempIt.next()).removeFromIndexes(aJCas);
> >>>> }
> >>>>
> >>>> // push tempList into HashSet
> >>>>
> >>>> HashSet<Annotation> hs = new HashSet<Annotation>();
> >>>>
> >>>> hs.addAll(tempList);
> >>>>
> >>>> // this should not allow duplicates
> >>>>
> >>>> System.out.println("HS length: "+hs.size()); // size should be less
the
> >>>> size of the FSIndex by the number of duplicates.  it is not. This is
the
> >>>> main problem
> >>>>
> >>>> tempList.clear();
> >>>>
> >>>> tempList.addAll(hs);
> >>>>
> >>>> System.out.println("templist length: "+tempList.size());
> >>>>
> >>>>
> >>>> Iterator<Annotation> it2 = tempList.iterator(); // this should
now be the
> >>>> clean list
> >>>>
> >>>>
> >>>> while(it2.hasNext()){
> >>>> it2.next().addToIndexes(aJCas);
> >>>> }
> >
> >
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message