mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Quentin-Gabriel Thurier <quentin.thur...@gmail.com>
Subject Re: Problem with ItemSimilarityJob, empty part-r-00000
Date Tue, 21 Jan 2014 16:35:36 GMT
I'm using mahout-examples-0.7-cdh4.5.0-job.jar locally. But I tried on EMR
(with mahout-examples-0.8-job.jar this time) on 3000 tracks and I also had
empty result files. Should I send you the dataset on your apache address
(it is only 140Ko)?

Quentin


2014/1/21 Sebastian Schelter <ssc@apache.org>

> Hmm, strange. Which version of mahout are you using? Do you run the 1200
> tracks job locally or on a cluster? Can you share your input file (in
> private)?
>
> --sebastian
>
>
>
> On 01/21/2014 02:34 PM, Quentin-Gabriel Thurier wrote:
>
>> Hi Sebastian
>>
>> I tested the job on a tiny example (50 tracks) :
>>
>>  mahout itemsimilarity --input input/msd_sample/mahout5 --output
>>>
>> output/mahout5 --similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE
>> --booleanData false --maxSimilaritiesPerItem 1
>>
>> *1st row of the output:
>>
>> -2135949055     -335737401      0.09939478338891584
>>
>> *related rows from the input:
>>
>> 1,-2135949055,230.42567
>> 2,-2135949055,0.0
>> 3,-2135949055,0.0
>> 4,-2135949055,-3.96
>> 5,-2135949055,-1.0
>> 6,-2135949055,96.897
>> 1,-335737401,222.35384
>> 2,-335737401,0.0
>> 3,-335737401,0.0
>> 4,-335737401,-5.232
>> 5,-335737401,-1.0
>> 6,-335737401,100.812
>>
>> This is correct :
>> 1/(1+(230.42567-222.35384)^2+(-3.96--5.232)^2+(96.897-100.812)^2)
>> = 0.09939483
>>
>> I don't have any exception except the usual warning : WARN
>> mapred.JobClient: Use GenericOptionsParser for parsing the arguments.
>> Applications should implement Tool for the same.
>>
>> Then I take 1200 tracks (the 50 previous are included in the 1200) the job
>> don't fail but part-r-00000 is empty. As previously I only have a warning
>> and the input looks like:
>>
>> 1,524572804,192.522
>> 2,524572804,0.0
>> 3,524572804,0.0
>> 4,524572804,-5.902
>> 5,524572804,-1.0
>> 6,524572804,123.756
>> 1,-1821170097,269.81833
>> 2,-1821170097,0.0
>> 3,-1821170097,0.0
>> 4,-1821170097,-13.496
>> 5,-1821170097,0.26586103
>> 6,-1821170097,86.643
>>
>> Quentin
>>
>>
>> 2014/1/21 Sebastian Schelter <ssc@apache.org>
>>
>>  Hi Quentin,
>>>
>>> Have you checked the log to ensure that you don't get any exceptions
>>> during the computation?
>>>
>>> Could you test the job with a tiny example where you can calculate the
>>> result by hand?
>>>
>>> Can you share an input file on which this job fails?
>>>
>>> --sebastian
>>>
>>>
>>> On 01/21/2014 11:22 AM, Quentin-Gabriel Thurier wrote:
>>>
>>>  I encounter few troubles with Mahout that I can't sort out..
>>>>
>>>> The context is that I'm trying to calculate pairwise euclidean distances
>>>> between music tracks based on 6 audio features per track. My input for
>>>> the
>>>> mahout job is a text file which looks like this:
>>>>
>>>> feature_id,track_id,feature_value
>>>> <integer>,< integer>,<double>
>>>>
>>>> This command works locally for less than 600 tracks (based on
>>>> mahout-core-0.7-cdh4.5.0-job.jar):
>>>>
>>>> mahout itemsimilarity --input input/msd_sample/mahout --output
>>>> output/mahout --similarityClassname
>>>> SIMILARITY_EUCLIDEAN_DISTANCE --booleanData false
>>>> --maxSimilaritiesPerItem 1
>>>>
>>>> But for more tracks I get an empty file part-r-0000. I tried to decrease
>>>> the --threshold parameter but I still don't have any result.
>>>>
>>>> I also tried to launch the job on aws EMR with the equivalent input for
>>>> 3000 tracks (based on mahout-core-0.8-job.jar):
>>>>
>>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>>>> --input
>>>> s3n://hadoop-filrouge/input/msd-sample/mahout --output
>>>> s3n://hadoop-filrouge/output/mahout/01202014-itemsimilarity
>>>> --similarityClassname SIMILARITY_EUCLIDEAN_DISTANCE --booleanData false
>>>> --maxSimilaritiesPerItem 1
>>>>
>>>> The job runs successfully but I get 17 empty part-r-000xx..
>>>>
>>>> I'm totally stuck right now and I'm running out of idea to fix this
>>>> issue.
>>>> So if anydody only have a little idea of what is going on, that could
>>>> really help.
>>>>
>>>> Many thanks,
>>>>
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message