opennlp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cohan Sujay Carlos <co...@aiaioo.com>
Subject Re: Suggestion/Query - Adding weights to words in Document Classifier
Date Wed, 18 Jan 2017 11:41:28 GMT
In machine learning, one learns the weights you're speaking of, Manoj.

So, the words that are more important for any category are given higher
weightage during classification.

However, rather than requiring a user to manually assign these weights, a
machine learning system learns the weights from training data.

That's what happens when you call say DocumentCategorizerME.train(*"en"*,
sampleStream);

The model that the train method returns is just a record of the "weights"
that have been learnt.

Cohan

On Wed, Jan 18, 2017 at 4:18 PM, Manoj B. Narayanan <
manojb.narayanan2011@gmail.com> wrote:

> Hi,
>
> I was wondering if there is a way to assign weights to certain words of a
> class in the Document Classifier.
>
> Some words are important for a particular class. Even though these words
> may occur in other classes, the level of importance may vary. So, if
> certain words in certain classes are given specific weights, it would
> produce more accurate results.
>
> Let me explain this with an example.
>
> Say we have 2 classes. Nature and Sports.
> Consider these 2 sentences :
>     1. We played basket ball, under the sun.
>     2. The sun is a big ball of fire.
>
> In the first sentence, which belongs to the class 'Sports', the words
> 'played','basket','ball' are more important than the word 'sun'. Whereas,
> in the second sentence, the words 'sun' and 'fire' are important than the
> word 'ball'.
>
> Thelevel of importance can be assigned by assigning weight to a few
> specific words that are distinct for a class.
>
> Is there already a way to do this in OpenNLP Document Classifier? If not
> please consider this.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message