opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Colen <william.co...@gmail.com>
Subject Re: DocumentSample in Doccat
Date Mon, 28 Apr 2014 20:17:25 GMT
Yes, it would be nice! Any other opinion?

Will you open a Jira for this improvement?

Thank you,
William

2014-04-27 21:59 GMT-03:00 Mark G <markg@apache.org>:

> In my local copy I have these methods in the interface:
>  Map<String, Double> scoreMap(String text);
>  SortedMap<Double, Set<String>> sortedScoreMap(String text);
>
> and these impls of them in the ME impl
>
>
>   public Map<String, Double> scoreMap(String text) {
>     Map<String, Double> probDist = new HashMap<String, Double>();
>
>     double[] categorize = categorize(text);
>     int catSize = getNumberOfCategories();
>     for (int i = 0; i < catSize; i++) {
>       String category = getCategory(i);
>       probDist.put(category, categorize[getIndex(category)]);
>     }
>     return probDist;
>
>   }
>
>   public SortedMap<Double, Set<String>> sortedScoreMap(String text) {
>     SortedMap<Double, Set<String>> descendingMap = new TreeMap<Double,
> Set<String>>().descendingMap();
>     double[] categorize = categorize(text);
>     int catSize = getNumberOfCategories();
>     for (int i = 0; i < catSize; i++) {
>       String category = getCategory(i);
>       double score = categorize[getIndex(category)];
>       if (descendingMap.containsKey(score)) {
>         descendingMap.get(score).add(category);
>       } else {
>         Set<String> newset = new HashSet<>();
>         newset.add(category);
>         descendingMap.put(score, newset);
>       }
>     }
>     return descendingMap;
>   }
>
>
> They are pretty simple, but if everyone agrees I can commit them (with some
> java docs)
>
>
>
>
>
> On Sat, Apr 26, 2014 at 8:39 AM, Jörn Kottmann <kottmann@gmail.com> wrote:
>
> > On Thu, 2014-04-24 at 19:54 -0300, William Colen wrote:
> > > Yes, it looks nice. Maybe we should redo all the DocumentCategorizer
> > > interface. It is different from other tools, for example, we can't get
> > the
> > > best category of one document with only one call, we need to use two
> > > methods.
> >
> > Yes that is right. +1 to change it. Can we deprecate the old methods and
> > just add new ones to not break backward compatibility?
> >
> > Jörn
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message