camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Ibsen <claus.ib...@gmail.com>
Subject Re: Processing VERY large result sets
Date Sat, 12 Nov 2016 08:52:21 GMT
Hi

I think in the past there has been some threads talking about this,
how to speedup such use-case. Not sure how easy it is to search for
those. For example use nabble or markmail to search in the archives.

On Fri, Nov 11, 2016 at 8:20 AM, Zoran Regvart <zoran@regvart.com> wrote:
> Hi Christian,
> I was solving the exact same problem few years back, here is what I
> did: I've created a custom @Handler that performs the JDBC query, the
> purpose of which was to return Iterator over the records. The
> implementation of the handler used springjdbc-iterable[1] to stream
> the rows as they were consumed by another @Handler that took the
> Iterator from the body and wrote line item by item using BeanIO.
>
> On a more recent project I had PostgreSQL as the database and could
> use the CopyManager[2] that proved to be very performant, perhaps your
> database the same functionality you can use.
>
> So basically custom coded the solution.
>
> zoran
>
> [1] https://github.com/apache/cxf
> [2] https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html
>
> On Thu, Nov 10, 2016 at 10:01 PM, Christian Jacob <cjacobme@aol.com> wrote:
>> Hi there,my task is to execute a JDBC query against a Hive database and
>> produce rows in csv files. The clue is, that depending on the query
>> criteria, the number of range from some dozens to some millions. My first
>> solution was something like this:
>> from ("...").to ("sql:...") // produces a List<Map&lt;String,
>> Object>>.split(body()).process(myProcessor) // produces a single row for the
>> csv file.to("file:destination?fileExists=Append");
>> This was awful slow because the file producer opens the file, appends one
>> single row, and closes it again.I found some posts how to use an Aggregator
>> before sending the content to the file producer. This really was the desired
>> solution, and the performance was satisfying. In this solution, the
>> aggregator holds the total content of the csv file to be produced.
>> Unfortunately, the files can be so large that I get stuck in "java gc
>> overhead limit exceeded" exceptions. No matter how high I set the heap
>> space, I have no chance to avoid this.Now I'm looking for a way how to get
>> out of this, and I don't know how. My ideas are:
>> Use a splitter that produces a sublist - I don't know how I could do it
>> Use an aggregator that does not produce the total content for the files to
>> be created, but only for example 1000 lines and then collects the next block
>> - I don't know it here either
>> Or maybe someone has a better idea...Kind regards,Christian
>>
>>
>>
>> --
>> View this message in context: http://camel.465427.n5.nabble.com/Processing-VERY-large-result-sets-tp5790018.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> --
> Zoran Regvart



-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2

Mime
View raw message