hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Dyer" <redp...@umd.edu>
Subject Re: computing conditional probabilities with Hadoop?
Date Tue, 02 Oct 2007 01:05:41 GMT
Thanks for the helpful replies on this.  The data that I am dealing
with has the characteristic that I may not be able to/want to load an
entire set of counts for <A, *> into memory for some values of A (the
curse of Zipfian distributions), so the final "join" step of the
process is the tricky part.

As of right now, I'm still having trouble determining how I can force
the first element of the set that will be iterated over by a single
reducer to be the marginal, and not some individual count.  Does
anyone know if Hadoop guarantees (can be made to guarantee) that the
relative order of keys that are equal will be left unchanged?  If so,
this would be a fairly easy solution.

Thank you!

View raw message