Mailing-List: contact users-help@camel.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@camel.apache.org
MIME-Version: 1.0
In-Reply-To: <CABD_Zr8WhpHehdA0gE_5oQyzxQ3=k3sO5z2f7aRVmg2ekyPG7w@mail.gmail.com>
References: <1478811669653-5790018.post@n5.nabble.com> <CABD_Zr8WhpHehdA0gE_5oQyzxQ3=k3sO5z2f7aRVmg2ekyPG7w@mail.gmail.com>
From: Claus Ibsen <claus.ibsen@gmail.com>
Date: Sat, 12 Nov 2016 09:52:21 +0100
Message-ID: <CAGB5yN=JEf6BU6qLgSFYnPi-anyjKUfZc7eOhpNnufeJ8B0G2A@mail.gmail.com>
Subject: Re: Processing VERY large result sets
To: "users@camel.apache.org" <users@camel.apache.org>
Content-Type: text/plain; charset=UTF-8
archived-at: Sat, 12 Nov 2016 08:52:47 -0000

Hi

I think in the past there has been some threads talking about this,
how to speedup such use-case. Not sure how easy it is to search for
those. For example use nabble or markmail to search in the archives.

On Fri, Nov 11, 2016 at 8:20 AM, Zoran Regvart <zoran@regvart.com> wrote:
> Hi Christian,
> I was solving the exact same problem few years back, here is what I
> did: I've created a custom @Handler that performs the JDBC query, the
> purpose of which was to return Iterator over the records. The
> implementation of the handler used springjdbc-iterable[1] to stream
> the rows as they were consumed by another @Handler that took the
> Iterator from the body and wrote line item by item using BeanIO.
>
> On a more recent project I had PostgreSQL as the database and could
> use the CopyManager[2] that proved to be very performant, perhaps your
> database the same functionality you can use.
>
> So basically custom coded the solution.
>
> zoran
>
> [1] https://github.com/apache/cxf
> [2] https://jdbc.postgresql.org/documentation/publicapi/org/postgresql/copy/CopyManager.html
>
> On Thu, Nov 10, 2016 at 10:01 PM, Christian Jacob <cjacobme@aol.com> wrote:
>> Hi there,my task is to execute a JDBC query against a Hive database and
>> produce rows in csv files. The clue is, that depending on the query
>> criteria, the number of range from some dozens to some millions. My first
>> solution was something like this:
>> from ("...").to ("sql:...") // produces a List<Map&lt;String,
>> Object>>.split(body()).process(myProcessor) // produces a single row for the
>> csv file.to("file:destination?fileExists=Append");
>> This was awful slow because the file producer opens the file, appends one
>> single row, and closes it again.I found some posts how to use an Aggregator
>> before sending the content to the file producer. This really was the desired
>> solution, and the performance was satisfying. In this solution, the
>> aggregator holds the total content of the csv file to be produced.
>> Unfortunately, the files can be so large that I get stuck in "java gc
>> overhead limit exceeded" exceptions. No matter how high I set the heap
>> space, I have no chance to avoid this.Now I'm looking for a way how to get
>> out of this, and I don't know how. My ideas are:
>> Use a splitter that produces a sublist - I don't know how I could do it
>> Use an aggregator that does not produce the total content for the files to
>> be created, but only for example 1000 lines and then collects the next block
>> - I don't know it here either
>> Or maybe someone has a better idea...Kind regards,Christian
>>
>>
>>
>> --
>> View this message in context: http://camel.465427.n5.nabble.com/Processing-VERY-large-result-sets-tp5790018.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> --
> Zoran Regvart


-- 
Claus Ibsen
-----------------
http://davsclaus.com @davsclaus
Camel in Action 2: https://www.manning.com/ibsen2