crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Join.join(PTable<?,Void>, ?) return empty collection
Date Thu, 31 Jul 2014 15:09:40 GMT
Understood. Anything I can do to help? Docfix, at least?


On Thu, Jul 31, 2014 at 1:08 AM, Mārtiņš Kalvāns <martins.kalvans@gmail.com>
wrote:

> It is avoidable almost always, problem is that in our company Crunch user
> base is growing and many of them are "not so technical" to fast and
> effectively catch problems like this and find workarounds. :(
>
>
> --
> Mārtiņš
>
>
> 2014-07-30 18:45 GMT+02:00 Josh Wills <jwills@cloudera.com>:
>
> > My hypothesis is that we re-use null in joins to indicate the absence of
> a
> > value, so if the value of an entry is null, we assume it's non-existent.
> > I'm assuming there isn't an easy way to switch the Void out for a
> non-null
> > but ignored value?
> >
> > J
> >
> >
> > On Wed, Jul 30, 2014 at 9:35 AM, Mārtiņš Kalvāns <
> > martins.kalvans@gmail.com>
> > wrote:
> >
> > > Hi.
> > >
> > > I stumbled on weird behaviour (bug?) when joining PTable<?, Void> on
> left
> > > side with any other PTable - resulting collection is empty.
> > > Attached example code demonstrates unexpected behaviour.
> > > Code in question is in org.apache.crunch.lib.join.InnerJoinFn line 59
> > > where it checks for null reference on left dataset (same for other join
> > fn
> > > implementations).
> > > Anyone can comment on this?
> > >
> > >
> > > --
> > > Mārtiņš Kalvāns
> > >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message