flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Neumann <mneum...@sics.se>
Subject broadcast set size
Date Thu, 09 Apr 2015 15:36:46 GMT

Up to what sizes are broadcast sets a good idea?

I have large dataset (~5 GB) and I'm only interested in lines with a 
certain ID that I have in a file. The file has ~10 k entries.
I could either Join the dataset with the IDList or I could broadcast the 
ID list and do the filtering in a Mapper.

What would be the better solution given the data sizes described above?
Is there a good rule of thumb when to switch from one solution to the other?

cheers Martin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message