nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-3055) StandardRecordWriter can throw UTFDataFormatException
Date Thu, 02 Feb 2017 23:15:51 GMT

    [ https://issues.apache.org/jira/browse/NIFI-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850708#comment-15850708
] 

ASF GitHub Bot commented on NIFI-3055:
--------------------------------------

Github user mosermw commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1469#discussion_r99244109
  
    --- Diff: nifi-commons/nifi-schema-utils/src/main/java/org/apache/nifi/repository/schema/SchemaRecordWriter.java
---
    @@ -136,4 +144,44 @@ private void writeFieldValue(final RecordField field, final Object
value, final
                     break;
             }
         }
    +
    +    private void writeUTFLimited(final DataOutputStream out, final String utfString)
throws IOException {
    +        try {
    +            out.writeUTF(utfString);
    +        } catch (UTFDataFormatException e) {
    +            final String truncated = utfString.substring(0, getCharsInUTFLength(utfString,
MAX_ALLOWED_UTF_LENGTH));
    +            logger.warn("Truncating UTF value!  Attempted to write string with char length
{} and UTF length greater than "
    +                            + "supported maximum allowed ({}), truncating to char length
{}.",
    +                    utfString.length(), MAX_ALLOWED_UTF_LENGTH, truncated.length());
    --- End diff --
    
    Can we mention provenance in this message, such as "Truncating provenance record value"?
 Does this message potentially mix char length and byte length, such as "Attempted to write
string with char length 40000 and UTF length greater than supported maximum allowed (65535),
truncating to char length 39000."?  Perhaps a simpler message such as "Attempted to store
string with length 40000, truncating to 39000."


> StandardRecordWriter can throw UTFDataFormatException
> -----------------------------------------------------
>
>                 Key: NIFI-3055
>                 URL: https://issues.apache.org/jira/browse/NIFI-3055
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 1.0.0, 0.7.1
>            Reporter: Brandon DeVries
>            Assignee: Joe Skora
>
> StandardRecordWriter.writeRecord()\[1] uses DataOutputStream.writeUTF()\[2] without checking
the length of the value to be written.  If this length is greater than  65535 (2^16 - 1),
you get a UTFDataFormatException "encoded string too long..."\[3].  Ultimately, this can result
in an IllegalStateException\[4], -bringing a halt to the data flow- causing PersistentProvenanceRepository
"Unable to merge <prov_journal> with other Journal Files due to..." WARNings.
> Several of the field values being written in this way are pre-defined, and thus not likely
an issue.  However, the "details" field can be populated by a processor, and can be of an
arbitrary length.  -Additionally, if the detail filed is indexed (which I didn't investigate,
but I'm sure is easy enough to determine), then the length might be subject to the Lucene
limit discussed in NIFI-2787-.
> \[1] https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/StandardRecordWriter.java#L163-L173
> \[2] http://docs.oracle.com/javase/7/docs/api/java/io/DataOutputStream.html#writeUTF%28java.lang.String%29
> \[3] http://stackoverflow.com/questions/22741556/dataoutputstream-purpose-of-the-encoded-string-too-long-restriction
> \[4] https://github.com/apache/nifi/blob/5fd4a55791da27fdba577636ac985a294618328a/nifi-nar-bundles/nifi-provenance-repository-bundle/nifi-persistent-provenance-repository/src/main/java/org/apache/nifi/provenance/PersistentProvenanceRepository.java#L754-L755



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message