Starting with your P.S.: It's not nutty; see MapWritable
for example, which can be used as a message type, or ArrayPrimitiveWritable
. In this project
, which I've found helpful in getting inspiration for things as I'm getting started, they use collections for messages in multiple places.
Going back to your main question: When you say many small vs fewer large messages, I guess you mean that they'd both be sent in the same superstep? If that's the case, I'd recommend just testing it since it's difficult to say, but also my thought is that you could wrap the set in a primitive collection like ArrayPrimitiveWritable if you go with the large message approach, and you might save a bit of memory that you're sending out, rather than sending a bunch of small ones as LongWritables or whatever it might be. If I remember correctly, with the project I'm working on, I tried both approaches and the large message approach was more effective. Then, there's also the option of (if you run into problems with memory, for example) using large messages but splitting the one superstep into multiples if it's feasible. In the end I've found that it's difficult to predict how it will perform, and it never hurts to try both approaches to take a look at the result.
Everyone else, please correct me if I've said something incorrectly, as I'm still relatively new at this.