drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Re: Format Plugin Question
Date Mon, 27 Mar 2017 04:33:52 GMT
Hi Charles,

You asked three questions.

* How do we write arrays?
* How do we write maps?
* What tools are available in the code to help?

Let’s start with maps because I happen to be mucking with those at the moment. A map in
Drill is really just a nested record, it is not a map like you’d find in Java or Python.
[1] is a conceptual write-up of how maps work in Drill.

To write to a map, you first create a map vector per record batch. The map is a container
of vectors for each member. The trick here is to realize that a Drill map is not an independent
collection of name/value pairs per record. It is instead a single collection of vectors shared
by ALL records in a batch. That is, in Drill, a map is a nested record (tuple), not really
a map in the classic sense. Once you create your vector for your map member, you can use it
just like a top-level vector.

Array vectors are just like other vectors: there is one vector for the entire record batch.
Arrays have an extra twist: an indirection vector that points to the first entry for each
record. All values from your field2 go into that single array; with the indirection vector
having an entry per record that points to the start of that record’s values. (The number
of values is found by taking the difference between the entry for record i+1 and that for
record i.)

The code does provide vector readers and writers, but I’m not very familiar with them.

The best place to see this in action is the JSON record reader, specifically the JsonReader

Perhaps others can provide better, more concrete suggestions.


- Paul

[1] https://github.com/paul-rogers/drill/wiki/Drill-Maps

On Mar 26, 2017, at 1:18 PM, Charles Givre <cgivre@gmail.com<mailto:cgivre@gmail.com>>

Hello all,
I’m working on a format plugin for a filetype that will have a mix of Strings and nested
fields.  Basically something like this:

field1:  String
field2:  Array
My preference is to keep the nested data in the nested format rather than de-nest it, but
I suppose that is always an option.

I’ve gotten the format plugin to write Strings to the Drill buffer, but I’m not quite
sure how to get it to write an Array or Map.  I’ve found the Map and List writer objects,
but I’m not quite sure how to use them in this context.  Are there any examples that someone
could point me to, or could someone explain how this can be done?
— C

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message