camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Ibsen <>
Subject Re: Processing VERY large result sets
Date Sat, 12 Nov 2016 08:52:21 GMT

I think in the past there has been some threads talking about this,
how to speedup such use-case. Not sure how easy it is to search for
those. For example use nabble or markmail to search in the archives.

On Fri, Nov 11, 2016 at 8:20 AM, Zoran Regvart <> wrote:
> Hi Christian,
> I was solving the exact same problem few years back, here is what I
> did: I've created a custom @Handler that performs the JDBC query, the
> purpose of which was to return Iterator over the records. The
> implementation of the handler used springjdbc-iterable[1] to stream
> the rows as they were consumed by another @Handler that took the
> Iterator from the body and wrote line item by item using BeanIO.
> On a more recent project I had PostgreSQL as the database and could
> use the CopyManager[2] that proved to be very performant, perhaps your
> database the same functionality you can use.
> So basically custom coded the solution.
> zoran
> [1]
> [2]
> On Thu, Nov 10, 2016 at 10:01 PM, Christian Jacob <> wrote:
>> Hi there,my task is to execute a JDBC query against a Hive database and
>> produce rows in csv files. The clue is, that depending on the query
>> criteria, the number of range from some dozens to some millions. My first
>> solution was something like this:
>> from ("...").to ("sql:...") // produces a List<Map&lt;String,
>> Object>>.split(body()).process(myProcessor) // produces a single row for the
>> csv"file:destination?fileExists=Append");
>> This was awful slow because the file producer opens the file, appends one
>> single row, and closes it again.I found some posts how to use an Aggregator
>> before sending the content to the file producer. This really was the desired
>> solution, and the performance was satisfying. In this solution, the
>> aggregator holds the total content of the csv file to be produced.
>> Unfortunately, the files can be so large that I get stuck in "java gc
>> overhead limit exceeded" exceptions. No matter how high I set the heap
>> space, I have no chance to avoid this.Now I'm looking for a way how to get
>> out of this, and I don't know how. My ideas are:
>> Use a splitter that produces a sublist - I don't know how I could do it
>> Use an aggregator that does not produce the total content for the files to
>> be created, but only for example 1000 lines and then collects the next block
>> - I don't know it here either
>> Or maybe someone has a better idea...Kind regards,Christian
>> --
>> View this message in context:
>> Sent from the Camel - Users mailing list archive at
> --
> Zoran Regvart

Claus Ibsen
----------------- @davsclaus
Camel in Action 2:

View raw message