mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Lyubimov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1490) Data frame R-like bindings
Date Wed, 21 May 2014 00:36:44 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004160#comment-14004160
] 

Dmitriy Lyubimov commented on MAHOUT-1490:
------------------------------------------

Also in a realistic case, we will be reading the frame blocks off media which does not internally
use that compression (most likely, the media would be row-wise). So compression will stream
in uncompressed data and will already have the memory bottleneck. So in order to justify compression
in these scenarios, we need to make sure that compressed source will be iterated over more
than one time. Again, this is all just a programming model. 

For example, there might be an api that says "build fast iterative source" explicitly, rather
than always assume it is a good thing. I kind of suspect that's what h2o "freeze" concept
encompasses.

> Data frame R-like bindings
> --------------------------
>
>                 Key: MAHOUT-1490
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1490
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Saikat Kanjilal
>            Assignee: Dmitriy Lyubimov
>             Fix For: 1.0
>
>   Original Estimate: 20h
>  Remaining Estimate: 20h
>
> Create Data frame R-like bindings for spark



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message