camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Willem jiang <willem.ji...@gmail.com>
Subject Re: Large file processing with Apache Camel
Date Fri, 22 Feb 2013 02:36:54 GMT
I just want to ask some question about your performance enhancement.

First, what made you think that reading multiple lines of XML will improve the performance?
I just read about the route you showed, you just send the exchange into a queue after reading
a line of the file. I don't think reading the multiple lines could improve the performance.

Second, you are sending the XML across the different Service Unit. Can you check if you are
using the Stax Stream to deal the XML?

BTW, It could be more easy if you make you step small by writing some simple unit test to
verify the performance enhancement you made.  

--  
Willem Jiang

Red Hat, Inc.
FuseSource is now part of Red Hat
Web: http://www.fusesource.com | http://www.redhat.com
Blog: http://willemjiang.blogspot.com (http://willemjiang.blogspot.com/) (English)
          http://jnn.iteye.com (http://jnn.javaeye.com/) (Chinese)
Twitter: willemjiang  
Weibo: 姜宁willem





On Friday, February 22, 2013 at 5:10 AM, cristisor wrote:

> Hello everybody,
>  
> I'm using Apache Fuse ESB with Apache Camel 2.4.0 (I think) to process some
> large files. Until now a service unit deployed in servicemix would read the
> file line by line, create and send an exchange containing that line to
> another service unit that would analyze the line and transform it into an
> xml according to some parameters, then send the new exchange to a new
> service unit that would map that xml to another xml format and send the new
> exchange containing the new xml to a final service unit that unmarshals the
> xml and inserts the object into a database. I arrived on the project, the
> architecture and the design are not mine, and I have to fix some serious
> performance problems. I suspect that reading the files line by line is
> slowing the processing very much, so I inserted an AggregationStrategy to
> aggregate 100 - 200 lines at once. Here I get into trouble:
> - if I send an exchange with more than 1 line I have to make a lot of
> changes on the xml to xml mappers, choice processors, etc
> - even if I solve the first problem, if I read 500 lines at once and I
> create a big xml from the data I get into an OOME exception, so I should
> read up to 50 lines in order to make sure that no exceptions will arise
>  
> What I'm looking for is a way to read 500 - 1000 lines at once but send each
> one in a different exchange to the service unit that creates the initial
> xml. My route looks similar to this one now:
>  
> from("file://myfile.txt")
> .marshal().string("UTF-8")
> .split(body().tokenize("\n")).streaming()
> .setHeader("foo", constant("foo"))
> .aggregate(header("foo"),  
> new
> StringBodyAggregator()).completionSize(50)
> .process(processor)
> .to("activemq queue");
>  
> I read something about a template producer but I'm not sure if it can help
> me. Basically I want to insert a mechanism to send more than one exchange,
> one for each read line, to the processor and then to the endpoint. This way
> I read from the file in batches of hundreds or thousands and I keep using
> the old mechanism for mapping, one line at a time.
>  
> Thank you.
>  
>  
>  
> --
> View this message in context: http://camel.465427.n5.nabble.com/Large-file-processing-with-Apache-Camel-tp5727977.html
> Sent from the Camel - Users mailing list archive at Nabble.com (http://Nabble.com).




Mime
View raw message