nifi-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Cave (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-1251) Allow ExecuteSQL to send out large result sets in chunks
Date Fri, 15 Jan 2016 15:27:39 GMT

    [ https://issues.apache.org/jira/browse/NIFI-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101915#comment-15101915
] 

Daniel Cave commented on NIFI-1251:
-----------------------------------

The issue you're still going to run into is that unless you are multithreading the to Avro
conversion, the aggregate output time is still going to be poor (even if you're getting pieces
of it incrementally).  I think you'll see the performance issue isn't with the database pull,
but instead with the very slow conversion to Avro process.

Would a better solution not be to internally to ExecuteSQL multi-thread the conversion rather
than trying to guess about a subdivision to the query (which may or may not be possible and
is likely to greatly increase the overhead on the database)?

> Allow ExecuteSQL to send out large result sets in chunks
> --------------------------------------------------------
>
>                 Key: NIFI-1251
>                 URL: https://issues.apache.org/jira/browse/NIFI-1251
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 0.5.0
>
>
> Currently, when using ExecuteSQL, if a result set is very large, it can take quite a
long time to pull back all of the results. It would be nice to have the ability to specify
the maximum number of records to put into a FlowFile, so that if we pull back say 1 million
records we can configure it to create 1000 FlowFiles, each with 1000 records. This way, we
can begin processing the first 1,000 records while the next 1000 are being pulled from the
remote database.
> This suggestion comes from Vinay via the dev@ mailing list:
> Is there way to have streaming feature when large result set is fetched from
> database basically to reads data from the database in chunks of records
> instead of loading the full result set into memory.
> As part of ExecuteSQL can a property be specified called "FetchSize" which
> Indicates how many rows should be fetched from the resultSet.
> Since jam bit new in using NIFI , can any guide me on above.
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message