mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Dósa <d...@elte.hu>
Subject DistributedLanczosSolver input
Date Tue, 29 Jun 2010 16:52:08 GMT
Hi,

I try to run the  
org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver.
My input look like (userid, itemid) as follows:
...
122641863,5060057723326
123441107,9789020282948
...

How can I transform my input to the format that  
DistributedLanczosSolver needs (rows = users, columns=items,  
elements=number of items/user)?

I tried to write a MapReduce Job  with Mapper<Object, Text,  
IntWritable, IntWritable>
that maps the row to userid as key and itemid as value
and a  
Reducer<IntWritable,IntWritable,IntWritable,SequentialAccessSparseVector>
that instantiates a SequentialAccessSparseVector with itemid as key  
and itemid as index and sum(itemid) as value.

I am getting this exception with the attached code:

2010-06-29 09:04:59,172 WARN org.apache.hadoop.mapred.TaskTracker:  
Error running child
java.lang.NullPointerException
	at  
org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:759)
	at  
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:487)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:575)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
	at org.apache.hadoop.mapred.Child.main(Child.java:170)


Can you suggest any other way?

Regards,
Laszlo


Mime
View raw message