So I followed up on why TextIO shuffles and dug into the code some. It is using the shards and getting all the values into a keyed group to write to a single file.
However... I wonder if there is way to just take the records that are on a worker and write them out. Thus not needing a shard number and doing this. Closer to how hadoop handle's writes.
Maybe just a regular pardo and on bundleSetup it creates a writer and processElement reuses that writter to write to the same file for all elements within a bundle?
I feel like this goes beyond scope of simple user mailing list so I'm expanding it to dev as well.
Finding a solution that prevents quadrupling shuffle costs when simply writing out a file is a necessity for large scale jobs that work with 100+ TB of data. If anyone has any ideas I'd love to hear them.