crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-315) Empty collection
Date Sun, 29 Dec 2013 14:01:51 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858348#comment-13858348
] 

Chao Shi commented on CRUNCH-315:
---------------------------------

Thanks Josh. +1 for your patch (I'm not familiar with Spark, but your code seems very straight
forward).
I agree that we will have to serialize data onto disk, and thus the implementation will be
simply. So the only question is whether do we really need it? If we are not quite sure, we
can here only add "empty collection" and wait for someone proposing a real use case for it.

> Empty collection
> ----------------
>
>                 Key: CRUNCH-315
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-315
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Chao Shi
>         Attachments: CRUNCH-315.patch
>
>
> As discussed in the mailing list [1] and [2], I'd like to add an empty collection feature.
On the API side, I think we can add a new method in Pipeline to create an empty collection.
The collection should be a subclass of PCollection and behaves like other normal PCollecitons.
There are also some optimization points that Josh mentioned in [2].
> I haven't thought it clearly. Just put a ticket here and see if anyone else has a better
idea.
> [1] http://mail-archives.apache.org/mod_mbox/crunch-dev/201312.mbox/%3CBLU0-SMTP1337A04FAC6B5F497F7473EADC10%40phx.gbl%3E
> [2] http://mail-archives.apache.org/mod_mbox/crunch-dev/201312.mbox/%3CCAH29n6MbSK9gapoC2DgVnhofjAobyasCuZh_0475DuSajV%3DCPg%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message