mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: [jira] Commented: (MAHOUT-593) Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure compatibility with current Mahout dependencies.
Date Wed, 02 Mar 2011 17:05:44 GMT
Sean, thanks for bearing with me. One last question, i promise i'll
drop it after this one.

> Yes, I don't think that in any Java API I can think of or HDFS you would have a successful
write and successful close but fail to write.
I am not saying that. I actually was just affirming that in case of
hdfs (albeit not in general per close() contract -- it's not true for
files for example).


ok what i am trying to argue is that using quietClose(...) in
o.a.m.common.ioutils is a bad practice for hdfs writes such as side
files and multiple outputs. I can't seem to get you agree on this.

But perhaps i can hope to  get you to agree that checking for close
errors at least doesn't hurt so i can continue checking for them and
not using quietClose() during MR task commits?




-d

On Wed, Mar 2, 2011 at 8:20 AM, Sean Owen (JIRA) <jira@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/MAHOUT-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001470#comment-13001470
]
>
> Sean Owen commented on MAHOUT-593:
> ----------------------------------
>
> I think we're just talking about semantics now. You are welcome to commit as you like.
>
> I do agree that a stream could buffer writes, and that it could commit as a way to clean
up before close, and the commit could fail during close. If the caller is trying to commit
by closing, that's really the problem. But, it's a real problem.Yes, I don't think that in
any Java API I can think of or HDFS you would have a successful write and successful close
but fail to write.
>
>> Backport of Stochastic SVD patch (Mahout-376) to hadoop 0.20 to ensure compatibility
with current Mahout dependencies.
>> ----------------------------------------------------------------------------------------------------------------------
>>
>>                 Key: MAHOUT-593
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-593
>>             Project: Mahout
>>          Issue Type: New Feature
>>          Components: Math
>>    Affects Versions: 0.4
>>            Reporter: Dmitriy Lyubimov
>>             Fix For: 0.5
>>
>>         Attachments: MAHOUT-593.patch.gz, MAHOUT-593.patch.gz, MAHOUT-593.patch.gz,
SSVD-givens-CLI.pdf, ssvdclassdiag.png
>>
>>
>> Current Mahout-376 patch requries 'new' hadoop API.  Certain elements of that API
(namely, multiple outputs) are not available in standard hadoop 0.20.2 release. As such, that
may work only with either CDH or 0.21 distributions.
>>  In order to bring it into sync with current Mahout dependencies, a backport of
the patch to 'old' API is needed.
>> Also, some work is needed to resolve math dependencies. Existing patch relies on
apache commons-math 2.1 for eigen decomposition of small matrices. This dependency is not
currently set up in the mahout core. So, certain snippets of code are either required to go
to mahout-math or use Colt eigen decompositon (last time i tried, my results were mixed with
that one. It seems to produce results inconsistent with those from mahout-math eigensolver,
at the very least, it doesn't produce singular values in sorted order).
>> So this patch is mainly moing some Mahout-376 code around.
>
> --
> This message is automatically generated by JIRA.
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

Mime
View raw message