mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Schilling <ch...@cellixis.com>
Subject Re: Memory Use when serializing SGD Model
Date Thu, 30 Dec 2010 01:05:10 GMT
Hey Ted, 

Sorry for the noise.  I am looking around in the o.a.m.classifier.sgd.ModelSerializer and
I only see methods for writeJson...


On Dec 29, 2010, at 4:01 PM, Ted Dunning wrote:

> Yes.
> 
> That is evil.  The problem is that GSON recurses on lists and that makes
> memory use crazy bad.
> 
> Try serializing as binary.  I committed a change to allow that a few weeks
> ago that added a method to ModelSerializer.  The SGD models are also all
> Writable's now which should make rolling your own serialization very easy..
> 
> 
> On Wed, Dec 29, 2010 at 3:59 PM, Chris Schilling
> <chris.schilling@gmail.com>wrote:
> 
>> Hi again,
>> 
>> I notice that if I try to write the model for the 20 NG example, I am
>> running out of memory.  I am running on a small ec2 instance, so I run with
>> the JVM with -Xmx1400m.
>> 
>> So, I can train and dissect the model just fine.  However, when I try to
>> write the weights:
>> ModelSerializer.writeJson("/tmp/sgd_adaptive.model", learningAlgorithm);
>> 
>> My feature vector size is 10000.
>> 
>> I get an OOM exception:
>> 
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:221)
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$MatrixTypeAdapter.serialize(ModelSerializer.java:210)
>>       at
>> com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148)
>>       at
>> com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141)
>>       at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
>>       at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>>       at
>> com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:445)
>>       at
>> com.google.gson.DefaultTypeAdapters$CollectionTypeAdapter.serialize(DefaultTypeAdapters.java:431)
>>       at
>> com.google.gson.JsonSerializationVisitor.visitFieldUsingCustomHandler(JsonSerializationVisitor.java:148)
>>       at
>> com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:141)
>>       at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
>>       at
>> com.google.gson.JsonSerializationVisitor.getJsonElementForChild(JsonSerializationVisitor.java:117)
>>       at
>> com.google.gson.JsonSerializationVisitor.addAsChildOfObject(JsonSerializationVisitor.java:95)
>>       at
>> com.google.gson.JsonSerializationVisitor.visitObjectField(JsonSerializationVisitor.java:90)
>>       at
>> com.google.gson.ObjectNavigator.navigateClassFields(ObjectNavigator.java:147)
>>       at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:122)
>>       at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>>       at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:40)
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:333)
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$StateTypeAdapter.serialize(ModelSerializer.java:287)
>>       at
>> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128)
>>       at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96)
>>       at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:375)
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$EvolutionaryProcessTypeAdapter.serialize(ModelSerializer.java:339)
>>       at
>> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128)
>>       at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96)
>>       at
>> com.google.gson.JsonSerializationContextDefault.serialize(JsonSerializationContextDefault.java:47)
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$AdaptiveLogisticRegressionTypeAdapter.serialize(ModelSerializer.java:189)
>>       at
>> org.apache.mahout.classifier.sgd.ModelSerializer$AdaptiveLogisticRegressionTypeAdapter.serialize(ModelSerializer.java:153)
>>       at
>> com.google.gson.JsonSerializationVisitor.visitUsingCustomHandler(JsonSerializationVisitor.java:128)
>>       at com.google.gson.ObjectNavigator.accept(ObjectNavigator.java:96)
>> 
>> Does this make sense?  seems like too much memory to serialize.
>> 
>> Thanks
>> Chris


Mime
View raw message