nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (Jira)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-7740) Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming
Date Tue, 01 Sep 2020 15:30:01 GMT

    [ https://issues.apache.org/jira/browse/NIFI-7740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188559#comment-17188559
] 

ASF subversion and git services commented on NIFI-7740:
-------------------------------------------------------

Commit 45470b0984ab83750155e9c7a540c79bfe862817 in nifi's branch refs/heads/main from Matt
Burgess
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=45470b0 ]

NIFI-7740: Add Records Per Transaction and Transactions Per Batch properties to PutHive3Streaming

NIFI-7740: Incorporated review comments

NIFI-7740: Restore RecordsEOFException superclass to SerializationError

This closes #4489.

Signed-off-by: Peter Turcsanyi <turcsanyi@apache.org>


> Add Records Per Transaction and Transactions Per Batch to PutHive3Streaming
> ---------------------------------------------------------------------------
>
>                 Key: NIFI-7740
>                 URL: https://issues.apache.org/jira/browse/NIFI-7740
>             Project: Apache NiFi
>          Issue Type: Improvement
>            Reporter: Matt Burgess
>            Assignee: Matt Burgess
>            Priority: Major
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The original PutHiveStreaming (for Hive 1.2.x) exposed properties to the user for tuning
the number of records in an individual Hive Streaming transaction, as well as the number of
transactions to be batched together (for performance).
> These properties should be exposed in the PutHive3Streaming processor in order to tune
its performance. The default values should result in the current behavior, so a setting of
zero for Records Per Transaction will put all records into a single transaction, and a setting
of 1 for Transactions Per Batch will result in a single transaction in each batch. Together
these defaults describe the current behavior.
> For large files, Records Per Transaction should be set to something more manageable,
such as 100K perhaps, and Transactions Per Batch to something such as 10. As a rule the product
of the two numbers should be larger than the largest expected number of records in the flow
file(s), this will ensure any failed transaction batches cause a full rollback. The documentation
for these properties should include this prescription.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message