mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Mann <mann.timo...@gmail.com>
Subject Re: What is the desired behavior of NGrams.generateNGrams()?
Date Sat, 03 Nov 2012 18:46:28 GMT
I recommend simply removing org.apache.mahout.common.nlp package, unless
there is a long term plan for it. NGrams is the only class in the package
and no one seems to know what the behavior of Map<String,List<String>>
NGrams.generateNGrams() should be. Furthermore, no one seems to be using
it. Even if someone is using it, the code is very small and could be
incorporated into the non-mahout side of the project.

There is some (independently implemented) n-grams computation going on in* *
org.apache.mahout.vectorizer.collocations.llr.CollocDriver* *but I don't
think this is related to NLP. Otherwise it might make sense to try to merge
the functionality (eventually).

-Tim


On Sat, Nov 3, 2012 at 12:25 PM, Sean Owen <srowen@gmail.com> wrote:

> (I also don't see any usages.)
>
>
> On Sat, Nov 3, 2012 at 5:08 PM, Timothy Mann <mann.timothy@gmail.com>
> wrote:
>
> > It looks like nothing in the core package is using
> > org.apache.mahout.common.nlp.NGrams. Is anyone using this class?
> >
> > -Tim
> >
> >
> > On Thu, Oct 25, 2012 at 10:22 PM, Timothy Mann <mann.timothy@gmail.com
> > >wrote:
> >
> > > I'm trying to write javadoc comments for
> > > org.apache.mahout.common.nlp.NGrams. generateNGramsWithoutLabel() makes
> > > sense, but I'm puzzled by the implementation of generateNGrams().
> > >
> > > Map<String,List<String>> NGrams.generateNGrams() returns a Map
from
> > > 'labels' to a list of 'tokens' (where each token is an n-gram of words
> > > separated by single spaces). In the current implementation only a
> single
> > > ('label', list of tokens) pair is put in the map. The 'label' is just
> the
> > > first word extracted from the specified text. I am guessing that the
> > > returned Map is being used as a pair. What is the significance of the
> > > 'label'?
> > >
> > > Thank you for your help.
> > >
> > > -Timothy Mann
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message