mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Bryan <gou...@gmail.com>
Subject Big Longs in RecommenderJob
Date Mon, 07 Jun 2010 21:42:37 GMT
I'm trying to use some real big longs in the RecommenderJob and I ran
into the following problem:

java.lang.IllegalArgumentException: Can't encode value as signed:
-9223224018927274648
	at org.apache.mahout.math.Varint.writeSignedVarLong(Varint.java:59)
	at org.apache.mahout.math.VarLongWritable.write(VarLongWritable.java:77)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:909)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:549)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.mahout.cf.taste.hadoop.ToEntityPrefsMapper.map(ToEntityPrefsMapper.java:68)
	at org.apache.mahout.cf.taste.hadoop.ToEntityPrefsMapper.map(ToEntityPrefsMapper.java:30)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:629)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:310)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)

It looks like the code in Varint.java wasn't accepting the full range
of Signed Longs (MIN_SIGNED_VAR_LONG = -(1L << 62) instead of
MIN_SIGNED_VAR_LONG = -(1L << 63)).

When I increase the max/min value it breaks on the negative check in
writeUnsignedVarLong....which seems like it shouldn't be there because
the number can be "interpreted" as negative since we're storing an
unsigned long in a signed long (since Java has no unsigned). So, when
I remove that check it breaks with "Variable length quantity is too
long" in the readUnsignedVarLong....and this is where things get fuzzy
for me. I basically replaced the code in both the readUnsigned blocks
with the Google code that is referenced. I also replaced the
DecodeZigZag blocks with the Google code. I then commented out the
exception tests and everything ran fine for me....with the other tests
passing I felt like like the end-to-end conversion was working, but
I'm not in a position to really validate my recommendations yet.

So perhaps if my technique sounds reasonable maybe someone could apply
the patch I've attached and check out the results on a known sample?

Loving Mahout so far, thanks all!

Matt

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message