mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suneel Marthi (JIRA)" <>
Subject [jira] [Assigned] (MAHOUT-1634) ALS don't work when it adds new files in Distributed Cache
Date Wed, 30 Mar 2016 05:30:25 GMT


Suneel Marthi reassigned MAHOUT-1634:

    Assignee: Suneel Marthi  (was: Andrew Musselman)

> ALS don't work when it adds new files in Distributed Cache
> ----------------------------------------------------------
>                 Key: MAHOUT-1634
>                 URL:
>             Project: Mahout
>          Issue Type: Bug
>          Components: Collaborative Filtering
>    Affects Versions: 0.10.1
>         Environment: Cloudera 5.1 VM, eclipse, zookeeper
>            Reporter: Cristian Galán
>            Assignee: Suneel Marthi
>              Labels: ALS, legacy
>             Fix For: 0.12.0
>         Attachments: mahout.patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
> ALS algorithm uses distributed cache to temp files, but the distributed cache have other
uses too, especially to add dependencies
> (,
so when in a hadoop's job we add a dependency library (or other file) ALS fails because it
reads ALL files in Distribution Cache without distinction.
> This occurs in the project of my company because we need to add Mahout dependencies (mahout,
lucene,...) in an hadoop Configuration to run Mahout's jobs, otherwise the Mahout's job fails
because it don't find the dependencies.
> I propose two options (I think two valid options):
> 1) Eliminate all .jar in the return of HadoopUtil.getCacheFiles
> 2) Elliminate all Path object distinct of /part-*
> I prefer the first because it's less aggressive, and I think this solution will be resolve
all problems.
> Pd: Sorry if my english is wrong.

This message was sent by Atlassian JIRA

View raw message