drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Capturing in-flight batches
Date Thu, 26 Oct 2017 17:14:13 GMT
Hi All,

Yesterday, in a conversation, Salim mentioned it would be handy to be able to capture and
replay in-flight batches in a Drill query in order to diagnose problems. As it turns out,
we have most of the pieces readily available; we just need someone to assemble them.

First, we have the IteratorValidatorBatchIterator class which sits on top of each operator
and validates that operator’s state. We extended it a while back to validate vector internals
to catch a few cases of offset vector corruption. This class could be extended to capture
in-flight batches for selected operators.

Second, we have the VectorAccessibleSerializable class (and the recently added VectorSerializer
wrapper class) that writes batches to, and reads batches from disk. This class is the foundation
of our spilling support.

Third, we have the EasyFormatPlugin class that lets us easily create a new disk-based reader.

Combine them and we can use the validator to write batches using the vector serializer. Then,
we create a new easy format plugin to read these files again using the vector serializer.

The good news is that most of these classes have been around since the early days, so any
technique built using them should work for any older versions of Drill we need to debug. (Though,
of course, we’d have to rebuild that old version to include the batch intercept code…)

Thanks,

- Paul

Mime
View raw message