mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: significance of FEATURES in SGD
Date Wed, 03 Jul 2013 17:34:50 GMT
The dimensionality of the feature vector definitely has a large impact on
accuracy as well as on the cost of the learning process.

I would be very surprised if you get good accuracy with a feature vector
with dimension 100.  Even 10,000 may be a bit small but with multiple
probes it may well work.

Your speed issues may also have to do with memory size.  Make sure you give
the process enough heap space to drive garbage collection overhead very low.

On Wed, Jul 3, 2013 at 5:58 AM, Chandra Mohan, Ananda Vel Murugan <> wrote:

> Hi,
> I am experimenting Mahout for text classification. I have 2 million
> training data i.e text of approximately 20 words. They fall into 121
> categories. I tried AdaptiveLogisticRegression. When I create sparse vector
> of cardinality 10000, it takes hours to converge, but when I tried with 100
> it converges fast. Is this measure very significant in determining the
> accuracy of the model? Please advise.
> Regards,
> Anand.C

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message