hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Limit of 64 slots when doing a map-side join
Date Wed, 25 Mar 2009 15:16:09 GMT
That code is highly optimized and quite difficult to follow. We have always
limited our joins to 31 members and ignored the problem.
But I think your jira and fixing it are the correct choices.

There is, in my opinion, a decent write up on how to use map side joins in
chapter 8 of my book, so I suspect more people will use this soon, as map
side join is an incredibly powerful tool.

In one of our production applications it took the run time from 5+ hours to
about 12 minutes.

On Wed, Mar 25, 2009 at 7:23 AM, Jingkei Ly <jingkei.ly@gmail.com> wrote:

> Am I right in thinking that the CompositeInputFormat is limited to joining
> 64 files?
> I believe this comes about because TupleWritable uses a single long-type
> instance field in order to maintain a bitset of tuple slots that have been
> written to - I'm guessing this is for performance reasons, but it also
> implies that the TupleWritable only has 64-bits to play with when joining.
> If my assumptions above are true, could replacing this long with a
> java.util.BitSet be appropiate in terms of making the map-side join package
> more scalable?

Alpha Chapters of my book on Hadoop are available

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message