Going back to the point of double split encoding, it would make sense to try a variant where we combine the sign and the mantissa. That should remove the sign stream at a relatively little cost of making the mantissa stream signed.
Thinking more about the layout options...
Another consideration is that we'd be better off not splitting the compression chunks between ranges and yet I'm worried about the overhead of closing all of the compression chunks and rle runs early.
So we could modify my #2 proposal to be sensitive to rle and compression chunks. If at the end of the row group, we wait until the rle and compression chunks close and interleave the streams. That means that for a column with three streams and two row groups, we could something like:
stream1.1, stream2.1, stream3.1, stream1.2, stream2.2, stream3.2
stream x.y contains a whole number of compression chunks and the majority of the data for row group X is in the stream *.X. This significantly improves the current state of affairs because now we know that if we read stream *.1, we'll have the entire first row group and can start decompression and processing while we read the other "stripelets".
By not forcing the closure of the rle and compression, we have preserved the compression and yet gained the ability to have async io in the reader.