nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-3055) StandardRecordWriter can throw UTFDataFormatException
Date Thu, 02 Feb 2017 19:17:51 GMT

    [ https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850351#comment-15850351
] 

ASF GitHub Bot commented on NIFI-3055:
--------------------------------------

GitHub user jskora opened a pull request:

    https://github.com/apache/nifi/pull/1470

    NIFI-3055 StandardRecordWriter Can Throw UTFDataFormatException (0.x)

    * Updated StandardRecordWriter to consider the encoding behavior of java.io.DataOutputStream.writeUTF()
and truncate string values such that the UTF representation will not be longer than that DataOutputStream's
64K UTF format limit.
    * Add test to confirm handling of large UTF strings.
    
    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [X] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [X] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number you are trying
to resolve? Pay particular attention to the hyphen "-" character.
    
    - [X] Has your PR been rebased against the latest commit within the target branch (typically
master)?
    
    - [X] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [X] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check
clean install at the root nifi folder?
    - [X] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way
that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?

    - [ ] If applicable, have you updated the LICENSE file, including the main LICENSE file
under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file
found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic
access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build issues and
submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jskora/nifi NIFI-3055-0.x-v3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1470.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1470
    
----
commit dab33e1e47214fa8797801126ca8e46b23693cd0
Author: Joe Skora <jskora@apache.org>
Date:   2017-02-02T19:11:05Z

    NIFI-3055 StandardRecordWriter Can Throw UTFDataFormatException (0.x)
    * Updated StandardRecordWriter to consider the encoding behavior of java.io.DataOutputStream.writeUTF()
and truncate string values such that the UTF representation will not be longer than that DataOutputStream's
64K UTF format limit.
    * Add test to confirm handling of large UTF strings.

----


> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
>                 Key: NIFI-3055
>                 URL: https://issues.apache.org/jira/browse/NIFI-3055
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 0.7.1
>            Reporter: Brandon DeVries
>            Assignee: Joe Skora
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] without checking
the length of the value to be written.  If this length is greater than  65535 (2^16 - 1),
you get a UTFDataFormatException "encoded string too long..."\[3].  Ultimately, this can result
in an IllegalStateException\[4], -bringing a halt to the data flow- causing PersistentProvenanceRepository
"Unable to merge <prov_journal> with other Journal Files due to..." WARNings.
> Several of the field values being written in this way are pre-defined, and thus not likely
an issue.  However, the "details" field can be populated by a processor, and can be of an
arbitrary length.  -Additionally, if the detail filed is indexed (which I didn't investigate,
but I'm sure is easy enough to determine), then the length might be subject to the Lucene
limit discussed in NIFI-2787-.
> \[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3] http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction
> \[4] https://github.com/apache/nifi/blob/5fd4a55791da27fdba577636ac985a294618328a/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L754-L755



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message