mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suneel Marthi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAHOUT-1627) Problem with ALS Factorizer MapReduce version when working with oozie because of files in distributed cache. Error: Unable to read sequence file from cache.
Date Wed, 30 Mar 2016 05:41:25 GMT

     [ https://issues.apache.org/jira/browse/MAHOUT-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Suneel Marthi updated MAHOUT-1627:
----------------------------------
    External issue URL: https://issues.apache.org/jira/MAHOUT-1634  (was: https://issues.apache.org/jira/MAHOUT-1624)

> Problem with ALS Factorizer MapReduce version when working with oozie because of files
in distributed cache. Error: Unable to read sequence file from cache.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1627
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1627
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.10.2
>         Environment: Hadoop
>            Reporter: Srinivasarao Daruna
>            Assignee: Suneel Marthi
>              Labels: legacy
>             Fix For: 0.12.0
>
>
> There is a problem with ALS Factorizer when working with distributed environment and
oozie.
> Steps:
> 1) Built mahout 1.0 jars and picked mahout-mrlegacy jar.
> 2) I have created a Java class in which i have called ParallelALSFactorizationJob with
respective inputs.
> 3) Submitted the job and there are list of Map Reduce jobs which got submitted to perform
the factorization.
> 4) Job failed at MultithreadedSharingMapper with the error Unable to read Sequnce file
"<ourprogram>.jar" pointing the code at org.apache.mahout.cf.taste.hadoop.als.ALS and
readMatrixByRowsFromDistributedCache method.
> Cause: The ALS class picks up input files which are sequential files from the distributed
cache using readMatrixByRowsFromDistributedCache method. However, when we are working in oozie
environment, the program jar as well being copied to distributed cache with input files. As
the ALS class trying to read all the files in distributed cache, it is failing when it encounters
jar. 
> The remedy would be setting a condition to pick files those are other than jars. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message