crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Object size
Date Mon, 24 Feb 2014 19:01:04 GMT
Ah, cool. the long getSize() method will return Crunch's estimate of the
size of the object in bytes, but it's good to keep in mind that it's a very
rough approximation based on the size of the file on disk and any info we
have about the behavior of any DoFns that are applied to the PTable when it
is processed, which is communicated via the scaleFactor() function on each
DoFn.


On Mon, Feb 24, 2014 at 10:57 AM, Jinal Shah <jinalshah2007@gmail.com>wrote:

> By size I meant the memory size sorry for the confusion. Like how much
> memory will a PTable object require. Basically what I'm trying to do is if
> the object is not that large and if it could fit in memory I wanted to
> apply map-side join to optimize the join and depending on that I also
> wanted to determine which one is smaller to use the Left join.
>
>
> On Mon, Feb 24, 2014 at 12:45 PM, Josh Wills <jwills@cloudera.com> wrote:
>
> > There is the length() method, which will return a PObject<Long> with the
> > number of elements in the PCollection. It requires running an MR job
> > though.
> >
> > J
> >
> >
> > On Mon, Feb 24, 2014 at 10:03 AM, Jinal Shah <jinalshah2007@gmail.com
> > >wrote:
> >
> > > Hi,
> > >
> > > Is there a way possible in crunch to find the size of a particular
> > > PCollection or PTable in whole.
> > >
> > > Thanks
> > > Jinal
> > >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message