drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4653) Malformed JSON should not stop the entire query from progressing
Date Tue, 13 Sep 2016 06:52:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15486473#comment-15486473

ASF GitHub Bot commented on DRILL-4653:

Github user ssriniva123 commented on the issue:

    Thanks for taking the time out in writing out a detailed email. Here are some of my thoughts.
    - Drill uses the com.fasterxml.jackson.core.json.UTF8StreamJsonParser for parsing of JSON
records. This parser does not rely on line delimiters for record separators but instead uses
    the JSON structure as a natural way to signal End of record (EOR). There are methods internal
    to the parser which check for line feeds but is not exposed to callers.
    - The CountingJsonReader uses the parser.skipChildren() method to skip the rest of the
children for this record, hence it is not possible to accurately count and match the no of
braces to cleanly skip that bad record.
    - One thought is to tap the inputsource of the parser on an exception condition, but is
    My thought process was exactly along the lines you have been thinking. On an exception
scenario the code attempts to locate a closing bracket(}) followed by a opening bracket ({).
    This is what is being done in the BaseJsonProcessor.processJSONException method. Please
note that it works in all cases except when we do not have proper brackets to signify end
of a JSON record. 
    Hope this explanation helps clarify.

> Malformed JSON should not stop the entire query from progressing
> ----------------------------------------------------------------
>                 Key: DRILL-4653
>                 URL: https://issues.apache.org/jira/browse/DRILL-4653
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Storage - JSON
>    Affects Versions: 1.6.0
>            Reporter: subbu srinivasan
>             Fix For: Future
> Currently Drill query terminates upon first encounter of a invalid JSON line.
> Drill has to continue progressing after ignoring the bad records. Something 
> similar to a setting of (ignore.malformed.json) would help.

This message was sent by Atlassian JIRA

View raw message