mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAHOUT-577) RowSimilarityJob hangs during CooccurrencesMapper
Date Sat, 08 Jan 2011 19:15:46 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen updated MAHOUT-577:
-----------------------------

         Due Date: 28/Jan/11
         Priority: Major  (was: Blocker)
    Fix Version/s: 0.5

Something still feels wrong here -- I've not seen behavior like this, no. However I do agree
that the output of this mapper is huge, and the right tuning can help a lot.

First thing to note is that this is very I/O intensive. If you're running 200 mappers on significantly
fewer machines, you may just be wasting time as mappers are competing for the same disk.

Also, I find it important to increase io.sort.factor to 100 or more, and increase io.sort.mb
quite a bit, to perhaps half the worker's heap. This makes Hadoop use much more memory to
merge segments, and merge more than 10 at a time. Having the mapper "stuck" at 100% sounds
symptomatic of a merge phase that is taking ages.

if that does the trick we can weave this into the code.

> RowSimilarityJob hangs during CooccurrencesMapper
> -------------------------------------------------
>
>                 Key: MAHOUT-577
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-577
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.4
>         Environment: Linux Debian 5.0.5, 12GB Ram, Hadoop 20.3 installation 
>            Reporter: Maya Hristakeva
>             Fix For: 0.5
>
>
> Hello,
> When trying to run a RowSimilarityJob on a matrix ( 146682 x 138351 ), the job gets through
the RowWeightMapper and WeightedOccurrencesPerColumnReducer, and hangs during the CooccurrencesMapper
although it shows that the map tasks are 100% complete. 
> The command I use to run the job is: 
> hadoop jar mahout-core-0.4-job.jar org.apache.mahout.math.hadoop.similarity.RowSimilarityJob
-Dmapred.input.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaCompressedDocumentsMatrix
-Dmapred.output.dir=/user/maya.hristakeva/mahout/core4/tf/1/0.001/title/12_07_10/lda/5/lda-sim/ldaDocumentSimilarityMatrix
-Dmapred.reduce.tasks=8 -Dmapred.map.tasks=200 -Dmapred.job.name=LDA_ROW_SIMILARITY_TEST --tempDir
/user/maya.hristakeva/temp/lda/5 --numberOfColumns 138351 --similarityClassname org.apache.mahout.math.hadoop.similarity.vector.DistributedEuclideanDistanceVectorSimilarity
--maxSimilaritiesPerRow 10
> And the output of the mappers which are 100% complete, but hanging is: 
> syslog logs
> 01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: bufstart = 29085149; bufend
= 39038598; bufvoid = 99614720
> 2011-01-05 18:30:00,835 INFO org.apache.hadoop.mapred.MapTask: kvstart = 65461; kvend
= 327605; length = 327680
> 2011-01-05 18:30:06,241 INFO org.apache.hadoop.mapred.MapTask: Finished spill 94
> 2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record
full = true
> 2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: bufstart = 39038598; bufend
= 48983989; bufvoid = 99614720
> 2011-01-05 18:30:09,208 INFO org.apache.hadoop.mapred.MapTask: kvstart = 327605; kvend
= 262068; length = 327680
> 2011-01-05 18:30:14,528 INFO org.apache.hadoop.mapred.MapTask: Finished spill 95
> 2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: Spilling map output: record
full = true
> 2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: bufstart = 48983989; bufend
= 58929384; bufvoid = 99614720
> 2011-01-05 18:30:17,328 INFO org.apache.hadoop.mapred.MapTask: kvstart = 262068; kvend
= 196531; length = 327680
> 2011-01-05 18:30:22,615 INFO org.apache.hadoop.mapred.MapTask: Finished spill 96
> .
> .
> .
> This problem does not occur when I use a toy matrix of 100 x 100, but once I give it
the original matrix of ..... the problem is always reproducible. 
> Any ideas on what could be causing this? 
> Thanks, 
> Maya Hristakeva

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message