drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5953) JSON reader unable to recover from most parse errors
Date Sat, 11 Nov 2017 20:47:00 GMT
Paul Rogers created DRILL-5953:

             Summary: JSON reader unable to recover from most parse errors
                 Key: DRILL-5953
                 URL: https://issues.apache.org/jira/browse/DRILL-5953
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.11.0
            Reporter: Paul Rogers

DRILL-4653 attempted to allow the JSON parser to recover from malformed JSON records. As it
turns out, the fix only works for very specific errors, such as that in the one and only test
case for that bug:

{"balance": 1000.0,"num": 100,"is_vip": true,
  "name": "foo3",
  "curr": {"denom":"pound","test":{"value :false}}}

Notice the missing quote after "value" above.

The JSON parser steadfastly refuses to recover from other problems such as:

{a: }

The Jackson JSON parser enters a state in which it simply will not read further tokens; instead
it always returns null.

A more general solution must be done at the level of the input stream:

* When the Jackson parser reports a syntax error...
* Discard the current parser
* Read directly from the input stream to find the }\s\*{ pattern.
* Push the { back onto the input stream.
* Create a new parser that will start reading at the current input position.

Note that this work is further complicated by the fact that the JSON parser does buffering:
it reads more than one character from the input and stores them in {{_inputBuffer}}. Fortunately,
this field is protected, so it is possible to subclass the parser, gain access to the buffer,
and "push" the unused characters back onto a buffering input stream.

For now, however, it is not really accurate to say that Drill can recover from JSON read errors.
Instead, it can only recover from those errors that the JSON parser already handles, such
as the one example case handled in DRILL-4653.

This message was sent by Atlassian JIRA

View raw message