Return-Path: Delivered-To: apmail-activemq-camel-user-archive@locus.apache.org Received: (qmail 20448 invoked from network); 3 Sep 2008 14:05:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 3 Sep 2008 14:05:22 -0000 Received: (qmail 15735 invoked by uid 500); 3 Sep 2008 14:05:20 -0000 Delivered-To: apmail-activemq-camel-user-archive@activemq.apache.org Received: (qmail 15723 invoked by uid 500); 3 Sep 2008 14:05:20 -0000 Mailing-List: contact camel-user-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: camel-user@activemq.apache.org Delivered-To: mailing list camel-user@activemq.apache.org Received: (qmail 15710 invoked by uid 99); 3 Sep 2008 14:05:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Sep 2008 07:05:20 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [195.69.129.178] (HELO exsmtp02.exserver.dk) (195.69.129.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Sep 2008 14:04:18 +0000 Received: from EXVS04.exserver.dk ([10.10.10.84]) by exsmtp02.exserver.dk with Microsoft SMTPSVC(6.0.3790.1830); Wed, 3 Sep 2008 16:02:16 +0200 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Subject: RE: [SPAM] RE: Splitter for big files Date: Wed, 3 Sep 2008 16:04:47 +0200 Message-ID: <4C1FB9C00D24A140906239533638C4D20536FE49@EXVS04.exserver.dk> In-Reply-To: <19289425.post@talk.nabble.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [SPAM] RE: Splitter for big files Thread-Index: AckNyortH/QQs4vBRP66h6/qw3R5TwAAruRQ From: "Claus Ibsen" To: X-OriginalArrivalTime: 03 Sep 2008 14:02:16.0225 (UTC) FILETIME=[AC053910:01C90DCD] X-Virus-Checked: Checked by ClamAV on apache.org Hi With or without these improvements the transaction issue is still the = same. The patches just improve the memory usage to not load the entire file = into memory before splitting. The transactional issue should be handled by external Transaction = managers such as Spring, JTA in a J2EE container or others. Notice this = usually only works with JMS and JDBC. So if you for instance want to read a big file, split it into lines, = processes each line and store each line in a database. Then you could = put the exchanges on a JMS queue before it's stored in the database to = ensure a safe point. Then the JMS can redo until the database is = updated. from(file).split().to(jms); from(jms).process().to(jdbc); Med venlig hilsen =20 Claus Ibsen ...................................... Silverbullet Skovsg=E5rdsv=E6nget 21 8362 H=F8rning Tlf. +45 2962 7576 Web: www.silverbullet.dk -----Original Message----- From: cmoulliard [mailto:cmoulliard@gmail.com]=20 Sent: 3. september 2008 15:41 To: camel-user@activemq.apache.org Subject: [SPAM] RE: Splitter for big files If we implement what the different stakeholders propose, can we = guarantee that in case a problem occurs during the parsing of the file, a rollback = of the messages created (by the batch or the tokenisation) will be done ? Kind regards, =20 Claus Ibsen wrote: >=20 > Hi >=20 > I have created 2 tickets to track this: > CAMEL-875, CAMEL-876 >=20 > Med venlig hilsen > =20 > Claus Ibsen > ...................................... > Silverbullet > Skovsg=E5rdsv=E6nget 21 > 8362 H=F8rning > Tlf. +45 2962 7576 > Web: www.silverbullet.dk >=20 > -----Original Message----- > From: Claus Ibsen [mailto:ci@silverbullet.dk]=20 > Sent: 2. september 2008 21:44 > To: camel-user@activemq.apache.org > Subject: RE: Splitter for big files >=20 > Ah of course well spotted. The tokenize is the memory hog. Good idea = with > the java.util.Scanner. >=20 > So combined with the batch stuff we should be able to operate on = really > big files without consuming too much memory ;) >=20 >=20 > Med venlig hilsen > =20 > Claus Ibsen > ...................................... > Silverbullet > Skovsg=E5rdsv=E6nget 21 > 8362 H=F8rning > Tlf. +45 2962 7576 > Web: www.silverbullet.dk > -----Original Message----- > From: Gert Vanthienen [mailto:gert.vanthienen@skynet.be]=20 > Sent: 2. september 2008 21:28 > To: camel-user@activemq.apache.org > Subject: Re: Splitter for big files >=20 > L.S., >=20 > Just added my pair of eyes ;). One part of the problem is indeed the=20 > list of exchanges that is returned by the expression, but I think = you're=20 > also reading the entire file into memory a first time for tokenizing=20 > it. ExpressionBuilder.tokenizeExpression() converts the type to = string=20 > and then uses a StringTokenizer on that. I think we could add support = > there for tokenizing File, InputStreams and Readers directly using a=20 > Scanner. >=20 > Regards, >=20 > Gert >=20 > Claus Ibsen wrote: >> Hi >> >> Looking into the source code of the splitter it looks like it creates = the >> list of splitted exchanges before they are being processed. That is = why >> it then will consume memory for big files. >> >> Maybe somekind of batch size option is needed so you can set for = instance >> number, say 20 as batch size. >> >> .splitter(body(InputStream.class).tokenize("\r\n").batchSize(20)) >> >> Could you create a JIRA ticket for this improvement? >> Btw how big is the files you use?=20 >> >> The file component uses a File as the object.=20 >> So when you split using the input stream then Camel should use the = type >> converter from File -> InputStream, that doesn't read the entire = content >> into memory. This happends in the splitter where it creates the = entire >> list of new exchanges to fire. >> >> At least that is what I can read from the source code after a long = days >> work, so please read the code as 4 eyes is better that 2 ;) >> >> >> >> Med venlig hilsen >> =20 >> Claus Ibsen >> ...................................... >> Silverbullet >> Skovsg=E5rdsv=E6nget 21 >> 8362 H=F8rning >> Tlf. +45 2962 7576 >> Web: www.silverbullet.dk >> >> -----Original Message----- >> From: Bart Frackiewicz [mailto:bart@open-medium.com]=20 >> Sent: 2. september 2008 17:40 >> To: camel-user@activemq.apache.org >> Subject: Splitter for big files >> >> Hi, >> >> i am using this route for a couple of CSV file routes: >> >> from("file:/tmp/input/?delete=3Dtrue") >> .splitter(body(InputStream.class).tokenize("\r\n")) >> .beanRef("myBean", "process") >> .to("file:/tmp/output/?append=3Dtrue") >> >> This works fine for small CSV files, but for big files i noticed >> that camel uses a lot of memory, it seems that camel is reading >> the file into memory. What is the configuration to use a stream >> in the splitter? >> >> I recognized the same behaviour in the xpath splitter: >> >> from("file:/tmp/input/?delete=3Dtrue") >> .splitter(ns.xpath("//member")) >> ... >> >> BTW, i found a posting from march, where James suggest following >> implementation for an own splitter: >> >> -- quote -- >> >> from("file:///c:/temp?noop=3Dtrue)"). >> splitter().method("myBean", "split"). >> to("activemq:someQueue") >> >> Then register "myBean" with a split method... >> >> class SomeBean { >> public Iterator split(File file) { >> /// figure out how to split this file into rows... >> } >> } >> -- quote -- >> >> But this won't work for me (Camel 1.4). >> >> Bart >> >> =20 >=20 >=20 >=20 ----- Enterprise Architect Xpectis 12, route d'Esch L-1470 Luxembourg Phone +352 25 10 70 470 Mobile +352 621 45 36 22 e-mail : cmoulliard@xpectis.com web site : www.xpectis.com www.xpectis.com=20 My Blog : http://cmoulliard.blogspot.com/ = http://cmoulliard.blogspot.com/ =20 --=20 View this message in context: = http://www.nabble.com/Splitter-for-big-files-tp19272583s22882p19289425.ht= ml Sent from the Camel - Users mailing list archive at Nabble.com.