nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (Jira)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-7740) Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming
Date Wed, 02 Sep 2020 18:00:01 GMT

    [ https://issues.apache.org/jira/browse/NIFI-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17189599#comment-17189599
] 

ASF subversion and git services commented on NIFI-7740:
-------------------------------------------------------

Commit c10bd4990bfcc5f5fd17c3eefdb03801e7a036a9 in nifi's branch refs/heads/support/nifi-1.12.x
from Matt Burgess
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=c10bd49 ]

NIFI-7740: Add Records Per Transaction and Transactions Per Batch properties to PutHive3Streaming

NIFI-7740: Incorporated review comments

NIFI-7740: Restore RecordsEOFException superclass to SerializationError

This closes #4489.

Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>


> Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-7740
>                 URL: https://issues.apache.org/jira/browse/NIFI-7740
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>             Fix For: 1.13.0, 1.12.1
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The original PutHiveStreaming (for Hive 1.2.x) exposed properties to the user for tuning
the number of records in an individual Hive Streaming transaction, as well as the number of
transactions to be batched together (for performance).
> These properties should be exposed in the PutHive3Streaming processor in order to tune
its performance. The default values should result in the current behavior, so a setting of
zero for Records Per Transaction will put all records into a single transaction, and a setting
of 1 for Transactions Per Batch will result in a single transaction in each batch. Together
these defaults describe the current behavior.
> For large files, Records Per Transaction should be set to something more manageable,
such as 100K perhaps, and Transactions Per Batch to something such as 10. As a rule the product
of the two numbers should be larger than the largest expected number of records in the flow
file(s), this will ensure any failed transaction batches cause a full rollback. The documentation
for these properties should include this prescription.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message