crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Shi <stepi...@live.com>
Subject Emtpy PCollection
Date Mon, 23 Dec 2013 10:19:35 GMT
Hi devs,

Do we have an approach to represent an "empty" PCollection? I have ran into
problems quite often recently:

1) I want to union a list of PCollections. If the input list is empty, I
would prefer returning a PCollection rather than null, as I don't want to
check for null everywhere.

2) Some of my input parameter (i.e. path on HDFS) may be optional. The path
is read into a PCollection, and is left joined to another data set. The
left join is to add some extra properties to the data set, so it will be
fine if an empty set is joined.

I think my scenarios above should also be useful to others. Any ideas?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message