camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cmoulliard <cmoulli...@gmail.com>
Subject RE: Splitter for big files
Date Wed, 03 Sep 2008 13:41:21 GMT

If we implement what the different stakeholders propose, can we guarantee
that in case a problem occurs during the parsing of the file, a rollback of
the messages created (by the batch or the tokenisation) will be done ?

Kind regards,

 

Claus Ibsen wrote:
> 
> Hi
> 
> I have created 2 tickets to track this:
> CAMEL-875, CAMEL-876
> 
> Med venlig hilsen
>  
> Claus Ibsen
> ......................................
> Silverbullet
> Skovsgårdsvænget 21
> 8362 Hørning
> Tlf. +45 2962 7576
> Web: www.silverbullet.dk
> 
> -----Original Message-----
> From: Claus Ibsen [mailto:ci@silverbullet.dk] 
> Sent: 2. september 2008 21:44
> To: camel-user@activemq.apache.org
> Subject: RE: Splitter for big files
> 
> Ah of course well spotted. The tokenize is the memory hog. Good idea with
> the java.util.Scanner.
> 
> So combined with the batch stuff we should be able to operate on really
> big files without consuming too much memory ;)
> 
> 
> Med venlig hilsen
>  
> Claus Ibsen
> ......................................
> Silverbullet
> Skovsgårdsvænget 21
> 8362 Hørning
> Tlf. +45 2962 7576
> Web: www.silverbullet.dk
> -----Original Message-----
> From: Gert Vanthienen [mailto:gert.vanthienen@skynet.be] 
> Sent: 2. september 2008 21:28
> To: camel-user@activemq.apache.org
> Subject: Re: Splitter for big files
> 
> L.S.,
> 
> Just added my pair of eyes ;).  One part of the problem is indeed the 
> list of exchanges that is returned by the expression, but I think you're 
> also reading the entire file into memory a first time for tokenizing 
> it.  ExpressionBuilder.tokenizeExpression() converts the type to string 
> and then uses a StringTokenizer on that.  I think we could add support 
> there for tokenizing File, InputStreams and Readers directly using a 
> Scanner.
> 
> Regards,
> 
> Gert
> 
> Claus Ibsen wrote:
>> Hi
>>
>> Looking into the source code of the splitter it looks like it creates the
>> list of splitted exchanges before they are being processed. That is why
>> it then will consume memory for big files.
>>
>> Maybe somekind of batch size option is needed so you can set for instance
>> number, say 20 as batch size.
>>
>>    .splitter(body(InputStream.class).tokenize("\r\n").batchSize(20))
>>
>> Could you create a JIRA ticket for this improvement?
>> Btw how big is the files you use? 
>>
>> The file component uses a File as the object. 
>> So when you split using the input stream then Camel should use the type
>> converter from File -> InputStream, that doesn't read the entire content
>> into memory. This happends in the splitter where it creates the entire
>> list of new exchanges to fire.
>>
>> At least that is what I can read from the source code after a long days
>> work, so please read the code as 4 eyes is better that 2 ;)
>>
>>
>>
>> Med venlig hilsen
>>  
>> Claus Ibsen
>> ......................................
>> Silverbullet
>> Skovsgårdsvænget 21
>> 8362 Hørning
>> Tlf. +45 2962 7576
>> Web: www.silverbullet.dk
>>
>> -----Original Message-----
>> From: Bart Frackiewicz [mailto:bart@open-medium.com] 
>> Sent: 2. september 2008 17:40
>> To: camel-user@activemq.apache.org
>> Subject: Splitter for big files
>>
>> Hi,
>>
>> i am using this route for a couple of CSV file routes:
>>
>>    from("file:/tmp/input/?delete=true")
>>    .splitter(body(InputStream.class).tokenize("\r\n"))
>>    .beanRef("myBean", "process")
>>    .to("file:/tmp/output/?append=true")
>>
>> This works fine for small CSV files, but for big files i noticed
>> that camel uses a lot of memory, it seems that camel is reading
>> the file into memory. What is the configuration to use a stream
>> in the splitter?
>>
>> I recognized the same behaviour in the xpath splitter:
>>
>>    from("file:/tmp/input/?delete=true")
>>    .splitter(ns.xpath("//member"))
>>    ...
>>
>> BTW, i found a posting from march, where James suggest following
>> implementation for an own splitter:
>>
>> -- quote --
>>
>>    from("file:///c:/temp?noop=true)").
>>      splitter().method("myBean", "split").
>>      to("activemq:someQueue")
>>
>> Then register "myBean" with a split method...
>>
>> class SomeBean {
>>    public Iterator split(File file) {
>>       /// figure out how to split this file into rows...
>>    }
>> }
>> -- quote --
>>
>> But this won't work for me (Camel 1.4).
>>
>> Bart
>>
>>   
> 
> 
> 


-----
Enterprise Architect

Xpectis
12, route d'Esch
L-1470 Luxembourg

Phone +352 25 10 70 470
Mobile +352 621 45 36 22

e-mail : cmoulliard@xpectis.com
web site :  www.xpectis.com www.xpectis.com 
My Blog :  http://cmoulliard.blogspot.com/ http://cmoulliard.blogspot.com/  
-- 
View this message in context: http://www.nabble.com/Splitter-for-big-files-tp19272583s22882p19289425.html
Sent from the Camel - Users mailing list archive at Nabble.com.


Mime
View raw message