On Sun, Jul 26, 2020 at 5:52 AM Chris Nuernberger <chris@techascent.com> wrote:
Hmm, sounds reasonable enough.  I may be mistaken but it appears to me that the fact that the current code relies on mutably updating the vector schema root does preclude concurrent access or parallelized access to multiple record batches.  Potentially a map-batch method that returns a new vector-schema-root each time would work.

Yeah, you could do something like that. The issue you can see depending on your vector/batch sizes is increased heap usage. The stream based design of the current classes was built so that one minimized heap churn when working with large pipelines.