crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mārtiņš Kalvāns <martins.kalv...@gmail.com>
Subject Re: Join.join(PTable<?,Void>, ?) return empty collection
Date Thu, 31 Jul 2014 08:08:34 GMT
It is avoidable almost always, problem is that in our company Crunch user
base is growing and many of them are "not so technical" to fast and
effectively catch problems like this and find workarounds. :(


--
Mārtiņš


2014-07-30 18:45 GMT+02:00 Josh Wills <jwills@cloudera.com>:

> My hypothesis is that we re-use null in joins to indicate the absence of a
> value, so if the value of an entry is null, we assume it's non-existent.
> I'm assuming there isn't an easy way to switch the Void out for a non-null
> but ignored value?
>
> J
>
>
> On Wed, Jul 30, 2014 at 9:35 AM, Mārtiņš Kalvāns <
> martins.kalvans@gmail.com>
> wrote:
>
> > Hi.
> >
> > I stumbled on weird behaviour (bug?) when joining PTable<?, Void> on left
> > side with any other PTable - resulting collection is empty.
> > Attached example code demonstrates unexpected behaviour.
> > Code in question is in org.apache.crunch.lib.join.InnerJoinFn line 59
> > where it checks for null reference on left dataset (same for other join
> fn
> > implementations).
> > Anyone can comment on this?
> >
> >
> > --
> > Mārtiņš Kalvāns
> >
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message