Return-Path: X-Original-To: apmail-camel-users-archive@www.apache.org Delivered-To: apmail-camel-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 721A3D1BA for ; Mon, 12 Nov 2012 13:16:47 +0000 (UTC) Received: (qmail 23604 invoked by uid 500); 12 Nov 2012 13:16:46 -0000 Delivered-To: apmail-camel-users-archive@camel.apache.org Received: (qmail 22551 invoked by uid 500); 12 Nov 2012 13:16:44 -0000 Mailing-List: contact users-help@camel.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@camel.apache.org Delivered-To: mailing list users@camel.apache.org Received: (qmail 22535 invoked by uid 99); 12 Nov 2012 13:16:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Nov 2012 13:16:44 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [80.237.132.225] (HELO wp218.webpack.hosteurope.de) (80.237.132.225) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Nov 2012 13:16:36 +0000 Received: from p54a7611e.dip0.t-ipconnect.de ([84.167.97.30] helo=[172.17.16.53]); authenticated by wp218.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) id 1TXts7-0002HN-TR; Mon, 12 Nov 2012 14:16:16 +0100 Message-ID: <50A0F6A0.5090801@catify.com> Date: Mon, 12 Nov 2012 14:16:16 +0100 From: Claus Straube User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121011 Thunderbird/16.0.1 MIME-Version: 1.0 To: users@camel.apache.org Subject: Re: Camel performance tuning References: <1DCBA8A6-F5C3-4337-8FB7-98B6FEF47E95@altiuz.cl> <25432BA530B53F46B956128FB793F3E5251F3000@CTSINCHNSXMBL.cts.com> <3552151D-7D09-46FB-8EAD-EB25F04AB66A@altiuz.cl> <25432BA530B53F46B956128FB793F3E5251F30F2@CTSINCHNSXMBL.cts.com> In-Reply-To: Content-Type: multipart/alternative; boundary="------------070808020105040109030706" X-bounce-key: webpack.hosteurope.de;claus.straube@catify.com;1352726196;82d2b2ef; X-Virus-Checked: Checked by ClamAV on apache.org --------------070808020105040109030706 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Have you tried a higher completion size? For us 750 was the best. On 09.11.2012 19:59, Gonzalo Vasquez wrote: > Ok, I've included an aggregator in the splitter, as follows: > > > uri="file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext}" /> > > > ${date:now:mm}:${date:now:ss}.${date:now:SSS} > > > > > > > > ${file:name} > > > 250 > > uri="file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append" /> > > > message="Elapsed: ${header.start} - ${date:now:mm}:${date:now:ss}.${date:now:SSS}" /> > > > > And the AggregationStrategy: > > > > > I've also added some headers & logging to calculate elapsed time. > > Pre-aggregator the elapsed time was about 30 seconds (for the 5MB test file), and now is about half (15 secs), I can see clearly the improvement, but not as much as expected. > > Any extra tips? I''ve included the custom AggregationStrategy I had to create, as all I needed was appending/concatenating body contents. > > > > Gonzalo V�squez S�ez > Gerente Investigaci�n y Desarrollo (R&D) > Altiuz Soluciones Tecnol�gicas de Negocios Ltda. > Av. Nueva Tajamar 555 Of. 802, Las Condes > (56-2) 335 2461 > gvasquez@altiuz.cl > http://www.altiuz.cl > > > > > El 09-11-2012, a las 15:09, Christian M�ller escribi�: > >> Using Hypersonic, Hadoop or Mongo for such a use case is "over engineering" >> the requirement and will end up in much more complicated solution - IMO. >> >> Best, >> Christian >> >> On Fri, Nov 9, 2012 at 6:57 PM, wrote: >> >>> You may also want to check out Hadoop and map reduce >>> >>> >>> >>> http://camel.apache.org/hdfs.html >>> >>> >>> >>> with respect to point a and b. >>> >>> >>> >>> You can have an index on the record and the �reduce� job can serialize on >>> the index. >>> >>> >>> >>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl] >>> *Sent:* Friday, November 09, 2012 10:16 PM >>> *To:* users@camel.apache.org >>> *Subject:* Re: Camel performance tuning >>> >>> >>> >>> Thanks for your answer, my comments: >>> >>> >>> >>> a) a 5M file could be loaded into memory, but I have streaming enabled as >>> file size could be in the range of GB. Notwithstanding, I'll check what >>> Hypersonic & Mongo are, as I'm not aware of them. >>> >>> b) Parallel processing is set to false, because records must preserve >>> order on the output file >>> >>> c) Don't see the point here >>> >>> d) See a) >>> >>> e) what about async processing? There's no "long running process" here >>> >>> >>> >>> Thanks again.- >>> >>> >>> >>> *Gonzalo V�squez S�ez* >>> >>> *Gerente Investigaci�n y Desarrollo (R&D)* >>> *Altiuz* Soluciones Tecnol�gicas de Negocios Ltda. >>> Av. Nueva Tajamar 555 Of. 802, Las Condes >>> (56-2) 335 2461 >>> *gvasquez@altiuz.c l* >>> >>> *http://www.altiuz.cl* >>> >>> >>> >>> >>> >>> >>> >>> El 09-11-2012, a las 13:12, escribi�: >>> >>> >>> >>> I am really new to Camel but here are some options you can try >>> >>> >>> >>> a) Can you load the 5 MB file to memory before splitting it ? That >>> way IO will not be a problem. Probably put it in something like Hypersonic >>> or Mongo >>> >>> b) Why is parallel processing false ? Are the records related to >>> each other ? If true you can take advantage of multicore >>> >>> c) Is it possible to first split the files into chunks and then use >>> process the chunks independently ? >>> >>> d) Can you write into memory and flush at once ? >>> >>> e) Sync/Asynch : http://camel.apache.org/async.html >>> >>> >>> >>> *From:* Gonzalo Vasquez [mailto:gvasquez@altiuz.cl] >>> *Sent:* Friday, November 09, 2012 8:32 PM >>> *To:* users@camel.apache.org >>> *Subject:* Camel performance tuning >>> >>> >>> >>> I'm running a route that basically adds a character per line to a plain >>> text file, but it's taking to long, and it seems that it's due to some kind >>> of buffering issue when reading/writing from disk. >>> >>> >>> >>> I'm processing a 5MB file (attached as DC_FACCL132_0000 >>> MORA_1075_16-10-2012_19-09-47_15.txt.zip), with the corresponding XSL >>> template (also attached). >>> >>> >>> >>> It's taking for ever to process such a file, I understand I'm tokenizing >>> on line breaks, which could be the source of the problem as there are many >>> lines in the file (48198 exactly), but when running jvisualvm (see attached >>> images/snapshot)I can see the writing op is invoked 20386 times, which seem >>> not related to the line count. Is there an output buffer size that I can >>> configure? Or something like that? >>> >>> >>> >>> This is the route: >>> >>> >>> >>> >> >>> uri=" >>> file:///tmp/in?charset=Windows-1252&move=${file:parent}/../paged/${file:name.noext}.paged.ack&preMove=${file:name.noext}-${date:now:yyyyMMddHHmmssSSS}.${file:ext} >>> " /> >>> >>> >>> >>> >>> >>> >>> >>> >> >>> uri=" >>> file:///tmp/paged?charset=utf8&fileName=${file:name.noext}.paged&fileExist=Append >>> " /> >>> >>> >>> >>> >>> >>> >>> >>> This is the referenced bean: >>> >>> >>> >>> >>> >>> >> >>> value= >>> "/Users/gonzalovasquez/Documents/workspace/altiuz-reports/reports-etl/xsl/pager.xsl" >>> /> >>> >>> >>> >>> >>> >>> >>> >>> Camel versi�n is 2,10.1, and happens both on OSX & MS Windows, so I think >>> isn't a platform dependent problem, but a configuration one. >>> >>> >>> >>> Any ideas? Any thing else that I should send? >>> >>> >>> >>> Thanks! >>> >>> >>> >>> *Gonzalo V�squez S�ez* >>> >>> *Gerente Investigaci�n y Desarrollo (R&D)* >>> *Altiuz* Soluciones Tecnol�gicas de Negocios Ltda. >>> Av. Nueva Tajamar 555 Of. 802, Las Condes >>> (56-2) 335 2461 >>> *gvasquez@altiuz.c l* >>> >>> *http://www.altiuz.cl* >>> >>> >>> >>> >>> >>> This e-mail and any files transmitted with it are for the sole use >>> of the intended recipient(s) and may contain confidential and privileged >>> information. If you are not the intended recipient(s), please reply to the >>> sender and destroy all copies of the original message. Any unauthorized >>> review, use, disclosure, dissemination, forwarding, printing or copying of >>> this email, and/or any action taken in reliance on the contents of this >>> e-mail is strictly prohibited and may be unlawful. >>> >>> >>> This e-mail and any files transmitted with it are for the sole use of >>> the intended recipient(s) and may contain confidential and privileged >>> information. If you are not the intended recipient(s), please reply to the >>> sender and destroy all copies of the original message. Any unauthorized >>> review, use, disclosure, dissemination, forwarding, printing or copying of >>> this email, and/or any action taken in reliance on the contents of this >>> e-mail is strictly prohibited and may be unlawful. >>> >> >> >> -- --------------070808020105040109030706--