camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Ibsen <claus.ib...@gmail.com>
Subject Re: Camel performance tuning
Date Sat, 10 Nov 2012 08:18:50 GMT
On Fri, Nov 9, 2012 at 7:09 PM, Christian Müller
<christian.mueller@gmail.com> wrote:
> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering"
> the requirement and will end up in much more complicated solution - IMO.
>

Yeah it sure is. After all we are talking about appending data to a file.

You can just use the Java API as I have shown with the links for the 2
test cases.
The "fast" is much faster, as it reuses the same stream for the entire
processing,
and that is also how you would do it from java code, to iterate data
and write to the file.

If you want this without doing any java code in the Camel DSL, it
would need to enhance the file component
to allow it to store a file stream on the exchange, and have it pass
over to the next splitted message for re-use.
It's doable, but a bit "hard" to do to support this use-case.

> Best,
> Christian
>
> On Fri, Nov 9, 2012 at 6:57 PM, <Ramkumar.Iyer@cognizant.com> wrote:
>
>>  You may also want to check out Hadoop and map reduce
>>
>>
>>
>> http://camel.apache.org/hdfs.html
>>
>>
>>
>> with respect to point a and b.
>>
>>
>>
>> You can have an index on the record and the “reduce” job can serialize on
>> the index.
>>
>>
>>
>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 10:16 PM
>> *To:* users@camel.apache.org
>> *Subject:* Re: Camel performance tuning
>>
>>
>>
>> Thanks for your answer, my comments:
>>
>>
>>
>> a) a 5M file could be loaded into memory, but I have streaming enabled as
>> file size could be in the range of GB. Notwithstanding, I'll check what
>> Hypersonic & Mongo are, as I'm not aware of them.
>>
>> b) Parallel processing is set to false, because records must preserve
>> order on the output file
>>
>> c) Don't see the point here
>>
>> d) See a)
>>
>> e) what about async processing? There's no "long running process" here
>>
>>
>>
>> Thanks again.-
>>
>>
>>
>> *Gonzalo Vásquez Sáez*
>>
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gcoppa@altiuz.com>l*
>>
>> *http://www.altiuz.cl*
>>
>>
>>
>>
>>
>>
>>
>> El 09-11-2012, a las 13:12, <Ramkumar.Iyer@cognizant.com> escribió:
>>
>>
>>
>>   I am really new to Camel but here are some options you can try
>>
>>
>>
>> a)      Can you load the 5 MB file to memory before splitting it ? That
>> way IO will not be a problem. Probably put it in something like Hypersonic
>> or Mongo
>>
>> b)      Why is parallel  processing false ? Are the records related to
>> each other ? If true you can take advantage of multicore
>>
>> c)       Is it possible to first split the files into chunks and then use
>> process the chunks independently ?
>>
>> d)      Can you write into memory and flush at once ?
>>
>> e)      Sync/Asynch : http://camel.apache.org/async.html
>>
>>
>>
>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl]
>> *Sent:* Friday, November 09, 2012 8:32 PM
>> *To:* users@camel.apache.org
>> *Subject:* Camel performance tuning
>>
>>
>>
>> I'm running a route that basically adds a character per line to a plain
>> text file, but it's taking to long, and it seems that it's due to some kind
>> of buffering issue when reading/writing from disk.
>>
>>
>>
>> I'm processing a 5MB file (attached as DC_FACCL132_0000
>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL
>> template (also attached).
>>
>>
>>
>> It's taking for ever to process such a file, I understand I'm tokenizing
>> on line breaks, which could be the source of the problem as there are many
>> lines in the file (48198 exactly), but when running jvisualvm (see attached
>> images/snapshot)I can see the writing op is invoked 20386 times, which seem
>> not related to the line count. Is there an output buffer size that I can
>> configure? Or something like that?
>>
>>
>>
>> This is the route:
>>
>> <camel:route id="pager" autoStartup="true">
>>
>> <camel:from
>>
>> uri="
>> file:///tmp/in?charset=Windows-1252&amp;move=${file:parent}/../paged/${file:name.noext}.paged.ack&amp;preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}
>> " />
>>
>> <camel:split streaming="true" parallelProcessing="false">
>>
>> <camel:tokenize token="\n" />
>>
>> <camel:to uri="bean:pager" />
>>
>> <camel:to
>>
>> uri="
>> file:///tmp/paged?charset=utf8&amp;fileName=${file:name.noext}.paged&amp;fileExist=Append
>> " />
>>
>> </camel:split>
>>
>> </camel:route>
>>
>>
>>
>> This is the referenced bean:
>>
>>
>>
>> <bean id="pager" class="cl.altiuz.reports.etl.TextProcessor">
>>
>> <property name="xsltPath"
>>
>> value=
>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl"
>>  />
>>
>> <property name="param" value="C.*PAG.* 1" />
>>
>> </bean>
>>
>>
>>
>> Camel versión is 2,10.1, and happens both on OSX & MS Windows, so I think
>> isn't a platform dependent problem, but a configuration one.
>>
>>
>>
>> Any ideas? Any thing else that I should send?
>>
>>
>>
>> Thanks!
>>
>>
>>
>> *Gonzalo Vásquez Sáez*
>>
>> *Gerente Investigación y Desarrollo (R&D)*
>> *Altiuz* Soluciones Tecnológicas de Negocios Ltda.
>> Av. Nueva Tajamar 555 Of. 802, Las Condes
>> (56-2) 335 2461
>> *gvasquez@altiuz.c <gcoppa@altiuz.com>l*
>>
>> *http://www.altiuz.cl*
>>
>>
>>
>>
>>
>>        This e-mail and any files transmitted with it are for the sole use
>> of the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>>
>>  This e-mail and any files transmitted with it are for the sole use of
>> the intended recipient(s) and may contain confidential and privileged
>> information. If you are not the intended recipient(s), please reply to the
>> sender and destroy all copies of the original message. Any unauthorized
>> review, use, disclosure, dissemination, forwarding, printing or copying of
>> this email, and/or any action taken in reliance on the contents of this
>> e-mail is strictly prohibited and may be unlawful.
>>
>
>
>
> --



-- 
Claus Ibsen
-----------------
Red Hat, Inc.
FuseSource is now part of Red Hat
Email: cibsen@redhat.com
Web: http://fusesource.com
Twitter: davsclaus
Blog: http://davsclaus.com
Author of Camel in Action: http://www.manning.com/ibsen

Mime
View raw message