I think there is probably room for improvement in the entire pipeline. Doing some more in depth profiling might inform which areas to target for optimization and/or parallelize.  But I don't have any particular user configurable options.  For the schema in question, some of the comments about future improvements for def/rep level generation [1] might apply.

-Micah

[1] https://github.com/apache/arrow/blob/master/cpp/src/parquet/arrow/path_internal.cc#L20

On Fri, Mar 26, 2021 at 9:47 PM Weston Pace <weston.pace@gmail.com> wrote:
I'm fairly certain there is room for improvement in the C++
implementation for writing single files to ADLFS.  Others can correct
me if I'm wrong but we don't do any kind of pipelined writes.  I'd
guess this is partly because there isn't much benefit when writing to
local disk (writes are typically synchronous) but also because it's
much easier to write multiple files.

Is writing multiple files a choice for you?  I would guess using a
dataset write with multiple files would be significantly more
efficient than one large single file write on ADLFS.

-Weston

On Fri, Mar 26, 2021 at 6:28 PM Yeshwanth Sriram <yeshsriram@icloud.com> wrote:
>
> Hello,
>
> Thank you again for earlier help on improving overall ADLFS read latency using multiple threads which has worked out really well.
>
> I’ve incorporated buffering on the adls/writer implementation (upto 64 meg) . What I’m noticing is that the parquet_writer->WriteTable(table) latency dominates everything else on the output phase of the job (~65sec vs ~1.2min ) .  I could use multiple threads (like io/s3fs) but not sure if it will have any effect on parquet write table operation.
>
> Question: Is there anything else I can leverage inside parquet/writer subsystem to improve the core parquet/write/table latency ?
>
>
> schema:
>   map<key,array<struct<…>>>
>   struct<...>
>   map<key,map<key,map<key, struct<…>>>>
>   struct<…>
>   binary
> num_row_groups: 6
> num_rows_per_row_group: ~8mil
> write buffer size: 64 * 1024 * 1024 (~64 mb)
> write compression: snappy
> total write latency per row group: ~1.2min
>  adls append/flush latency (minor factor)
> Azure: ESv3/RAM: 256Gb/Cores: 8
>
> Yesh