crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Join.join(PTable<?,Void>, ?) return empty collection
Date Sun, 03 Aug 2014 22:19:05 GMT
Posted a doc fix for this in CRUNCH-453, along with a few other updates to
the user guide.


On Fri, Aug 1, 2014 at 4:32 AM, Mārtiņš Kalvāns <martins.kalvans@gmail.com>
wrote:

> Yes, I think at least documentation about know issue could help.
> Thanks!
>
>
> 2014-07-31 17:09 GMT+02:00 Josh Wills <jwills@cloudera.com>:
>
> > Understood. Anything I can do to help? Docfix, at least?
> >
> >
> > On Thu, Jul 31, 2014 at 1:08 AM, Mārtiņš Kalvāns <
> > martins.kalvans@gmail.com>
> > wrote:
> >
> > > It is avoidable almost always, problem is that in our company Crunch
> user
> > > base is growing and many of them are "not so technical" to fast and
> > > effectively catch problems like this and find workarounds. :(
> > >
> > >
> > > --
> > > Mārtiņš
> > >
> > >
> > > 2014-07-30 18:45 GMT+02:00 Josh Wills <jwills@cloudera.com>:
> > >
> > > > My hypothesis is that we re-use null in joins to indicate the absence
> > of
> > > a
> > > > value, so if the value of an entry is null, we assume it's
> > non-existent.
> > > > I'm assuming there isn't an easy way to switch the Void out for a
> > > non-null
> > > > but ignored value?
> > > >
> > > > J
> > > >
> > > >
> > > > On Wed, Jul 30, 2014 at 9:35 AM, Mārtiņš Kalvāns <
> > > > martins.kalvans@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi.
> > > > >
> > > > > I stumbled on weird behaviour (bug?) when joining PTable<?, Void>
> on
> > > left
> > > > > side with any other PTable - resulting collection is empty.
> > > > > Attached example code demonstrates unexpected behaviour.
> > > > > Code in question is in org.apache.crunch.lib.join.InnerJoinFn line
> 59
> > > > > where it checks for null reference on left dataset (same for other
> > join
> > > > fn
> > > > > implementations).
> > > > > Anyone can comment on this?
> > > > >
> > > > >
> > > > > --
> > > > > Mārtiņš Kalvāns
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Director of Data Science
> > > > Cloudera <http://www.cloudera.com>
> > > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> > > >
> > >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message