drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Barclay <dbarc...@maprtech.com>
Subject Re: ensureAtLeastOneField: obsolete? needed?
Date Tue, 06 Oct 2015 01:09:02 GMT
Jacques,

 > ... it ... should probably check if a column is already setup in the output mutation
and, if so, don't re-add the ensure at least one column. Thus probably requires a new method
in output mutator (isEmpty or size)

Some questions on the various moving pieces:

  Does ensureAtLeastOneField() need to check only for the fieldWriter.integer(...) call, or
does it also need to check for the earlier fieldWriter.map(...) call? (I'm not clear on what
cases that loop can encounter.)

Would the method go (be declared) on MapWriter (re ensureAtLeastOneField()'s fieldWriter variable)
or on ComplexWriter (re its writer parameter) (or something else)?

Which data should the method implementation be consulting?  For example, I see that SingleMapWriter
has both container and fields members, and traversing down container (at least in the current
case I'm looking at) leads to fields primary, secondary, and delegate (in MapWithOrdinal).
Which is the primary/best source for whether there's any column (or subcolumn?) yet?)

Thanks,
Daniel



Jacques Nadeau wrote:
>
> It's purpose is to make sure that a projection that finds no columns still produces output.
 Think record {a:4,b:6}
>
> If I say 'select c from t' ,  the reader (without this functionality) return zero columns
(not supported).   As such, ensure at least one adds a column.  I believe it is still needed
but should probably check if a column is already setup in the output mutation and, if so,
don't re-add the ensure at least one column. Thus probably requires a new method in output
mutator (isEmpty or size)
>
> On Oct 3, 2015 7:49 PM, "Daniel Barclay" <dbarclay@maprtech.com <mailto:dbarclay@maprtech.com>>
wrote:
>
>     What exactly is the purpose of ensureAtLeastOneField() org.apache.drill.exec.vector.complex.fn.JsonReader
(drill/JsonReader.java at fdb6b4fecee30282d8f490e78b7f2dc3a2e27347 ยท apache/drill <https://github.com/apache/drill/blob/fdb6b4fecee30282d8f490e78b7f2dc3a2e27347/exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java#L92>)?
>
>     Is it still needed?
>
>
>     Currently, it makes the first field of a map be a nullable-integer field if the JSON
reader reads no nows. However, it does that /regardless/ of whether the first field already
exists from earlier readers, causing a later reader and ScanBatch to signal a schema change
when there wasn't really a schema change.  This is currently causing breaking in the attempt
to fix the ScanBatch/IterOutcome problems underlying DRILL-2288.
>
>     Example case:
>     - There are three JSON files.  The first and last to be read have the same schema.
 The middle file is empty.  They are all read by the same ScanBatch.
>     - The first JSON reader sets map fields.
>     - The second JSON reader sees no rows, so its atLeastOneWrite flag isn't set, so
its ensureAtLeastOneField() thinks it needs to add a field, but forcibly sets the first field
to be a nullable-int field--/regardless /of whether a first field exists, so it changes what
the first reader set it to.
>     - Then, somewhere in and/or downstream of the third reader (with in-progress ScanBatch
fixes in place), Drill gets incompatible-vector errors (mentioning the second reader's NullableIntVector
vs. the original reader's type for the first field) and/or schema-change-not-supported errors
because ScanBatch reported OK_NEW_SCHEMA (instead of OK) when the schema didn't really change
(between the first and third JSON files).
>
>
>     Disabling ensureAtLeastOneField() eliminated the wrong-vector-type or unsupported-schema-change
errors, and did not cause any new errors in the java-exec unit tests. (I haven't checked other
tests yet.)
>
>     Also, ensureAtLeastOneField() (or next()) has a comment about making sure there's
a field in order to return a record count, but next() returns the record count.
>
>     Those two things make me wonder if ensureAtLeastOneField()is now obsolete.
>
>     Can it be deleted now?
>
>     Or is it the case that it is still needed, but it needs to check whether there are
already any fields before (currently blindly) creating one?
>
>
>     Thanks,
>     Daniel
>
>     -- 
>     Daniel Barclay
>     MapR Technologies
>


-- 
Daniel Barclay
MapR Technologies


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message