camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Ibsen <claus.ib...@gmail.com>
Subject Re: Large file processing with Apache Camel
Date Fri, 22 Feb 2013 16:36:52 GMT
On Fri, Feb 22, 2013 at 5:35 PM, Claus Ibsen <claus.ibsen@gmail.com> wrote:
> Hi
>
> Have you seen the splitter with group N lines together section at
> http://camel.apache.org/splitter.html
>

Ah yeah you use an older Camel release. You can implement a custom
expression that does what this functionality in Camel 2.10 offers. You
can peak at the Camel source code to see how you can do that.

Basically just create a class that has a method that returns a
java.util.Iterator and then return data in bulk of 500 lines. Then the
Camel splitter in streaming mode will use that to walk the file.




>
> On Thu, Feb 21, 2013 at 10:10 PM, cristisor <cristisor_ac@yahoo.com> wrote:
>> Hello everybody,
>>
>> I'm using Apache Fuse ESB with Apache Camel 2.4.0 (I think) to process some
>> large files. Until now a service unit deployed in servicemix would read the
>> file line by line, create and send an exchange containing that line to
>> another service unit that would analyze the line and transform it into an
>> xml according to some parameters, then send the new exchange to a new
>> service unit that would map that xml to another xml format and send the new
>> exchange containing the new xml to a final service unit that unmarshals the
>> xml and inserts the object into a database. I arrived on the project, the
>> architecture and the design are not mine, and I have to fix some serious
>> performance problems. I suspect that reading the files line by line is
>> slowing the processing very much, so I inserted an AggregationStrategy to
>> aggregate 100 - 200 lines at once. Here I get into trouble:
>> - if I send an exchange with more than 1 line I have to make a lot of
>> changes on the xml to xml mappers, choice processors, etc
>> - even if I solve the first problem, if I read 500 lines at once and I
>> create a big xml from the data I get into an OOME exception, so I should
>> read up to 50 lines in order to make sure that no exceptions will arise
>>
>> What I'm looking for is a way to read 500 - 1000 lines at once but send each
>> one in a different exchange to the service unit that creates the initial
>> xml. My route looks similar to this one now:
>>
>> from("file://myfile.txt")
>>         .marshal().string("UTF-8")
>>         .split(body().tokenize("\n")).streaming()
>>                 .setHeader("foo", constant("foo"))
>>                 .aggregate(header("foo"),
>>                                 new
>> StringBodyAggregator()).completionSize(50)
>>                 .process(processor)
>>                 .to("activemq queue");
>>
>> I read something about a template producer but I'm not sure if it can help
>> me. Basically I want to insert a mechanism to send more than one exchange,
>> one for each read line, to the processor and then to the endpoint. This way
>> I read from the file in batches of hundreds or thousands and I keep using
>> the old mechanism for mapping, one line at a time.
>>
>> Thank you.
>>
>>
>>
>> --
>> View this message in context: http://camel.465427.n5.nabble.com/Large-file-processing-with-Apache-Camel-tp5727977.html
>> Sent from the Camel - Users mailing list archive at Nabble.com.
>
>
>
> --
> Claus Ibsen
> -----------------
> Red Hat, Inc.
> FuseSource is now part of Red Hat
> Email: cibsen@redhat.com
> Web: http://fusesource.com
> Twitter: davsclaus
> Blog: http://davsclaus.com
> Author of Camel in Action: http://www.manning.com/ibsen



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Mime
View raw message