mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robin Anil" <robin.a...@gmail.com>
Subject Re: CNB: Learning from Huge Datasets
Date Mon, 28 Jul 2008 10:26:09 GMT
Apparently. It was overfitting. I used the Test-Train split given by
Phillipe in mahout-user list.

When the algorithm was storing the weights of all the words in the
Complementary Class - The Accuracy over the Test set was 90.2% and the over
that of the Train set itself was 99.32%. But the Size of the Model ~= Number
of features x Number of labels

When the algorithm was storing the weights of just the words in the
Non-Complementary Class - The Accuracy over the Test set was 84.47% and that
over the Train set was 99.90%.  The Model becomes a sparse Matrix.

So i guess I will have to go back to the earlier method.



On Sat, Jul 12, 2008 at 11:54 AM, Robin Anil <robin.anil@gmail.com> wrote:

> It too soon for celebrations. This quick hack might have increased over
> fitting. Keep fingers crossed
>
> Robin
>
>
> On Sat, Jul 12, 2008 at 11:51 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
>> Well done!
>>
>> On Fri, Jul 11, 2008 at 11:18 PM, Robin Anil <robin.anil@gmail.com>
>> wrote:
>>
>> >
>> >
>> > The self classification accuracy on the 20Newsgroups jumped from 98.2 to
>> > 99.87. And it solved the dense matrix problem also
>> >
>>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message