flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LINZ, Arnaud" <AL...@bouyguestelecom.fr>
Subject How to force the parallelism on small streams?
Date Wed, 02 Sep 2015 15:41:24 GMT
Hi,

I have a source that provides few items since it gives file names to the mappers. The mapper
opens the file and process records. As the files are huge, one input line (a filename) gives
a consequent work to the next stage.
My topology looks like :
addSource(myFileSource).rebalance().setParallelism(100).map(myFileMapper)
If 100 mappers are created, about 85 end immediately and only a few process the files (for
hours). I suspect an optimization making that there is a minimum number of lines to pass to
the next node or it is “shutdown” ; but in my case I do want the lines to be evenly distributed
to each mapper.
How to enforce that ?

Greetings,
Arnaud

________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice
ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation
ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message,
merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent
this message cannot therefore be held liable for its content nor attachments. Any unauthorized
use or dissemination is prohibited. If you are not the intended recipient of this message,
then please delete it and notify the sender.
Mime
View raw message