camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claus Ibsen" ...@silverbullet.dk>
Subject RE: Splitter for big files
Date Wed, 03 Sep 2008 07:43:04 GMT
Hi

I have created 2 tickets to track this:
CAMEL-875, CAMEL-876

Med venlig hilsen
 
Claus Ibsen
......................................
Silverbullet
Skovsgårdsvænget 21
8362 Hørning
Tlf. +45 2962 7576
Web: www.silverbullet.dk

-----Original Message-----
From: Claus Ibsen [mailto:ci@silverbullet.dk] 
Sent: 2. september 2008 21:44
To: camel-user@activemq.apache.org
Subject: RE: Splitter for big files

Ah of course well spotted. The tokenize is the memory hog. Good idea with the java.util.Scanner.

So combined with the batch stuff we should be able to operate on really big files without
consuming too much memory ;)


Med venlig hilsen
 
Claus Ibsen
......................................
Silverbullet
Skovsgårdsvænget 21
8362 Hørning
Tlf. +45 2962 7576
Web: www.silverbullet.dk
-----Original Message-----
From: Gert Vanthienen [mailto:gert.vanthienen@skynet.be] 
Sent: 2. september 2008 21:28
To: camel-user@activemq.apache.org
Subject: Re: Splitter for big files

L.S.,

Just added my pair of eyes ;).  One part of the problem is indeed the 
list of exchanges that is returned by the expression, but I think you're 
also reading the entire file into memory a first time for tokenizing 
it.  ExpressionBuilder.tokenizeExpression() converts the type to string 
and then uses a StringTokenizer on that.  I think we could add support 
there for tokenizing File, InputStreams and Readers directly using a 
Scanner.

Regards,

Gert

Claus Ibsen wrote:
> Hi
>
> Looking into the source code of the splitter it looks like it creates the list of splitted
exchanges before they are being processed. That is why it then will consume memory for big
files.
>
> Maybe somekind of batch size option is needed so you can set for instance number, say
20 as batch size.
>
>    .splitter(body(InputStream.class).tokenize("\r\n").batchSize(20))
>
> Could you create a JIRA ticket for this improvement?
> Btw how big is the files you use? 
>
> The file component uses a File as the object. 
> So when you split using the input stream then Camel should use the type converter from
File -> InputStream, that doesn't read the entire content into memory. This happends in
the splitter where it creates the entire list of new exchanges to fire.
>
> At least that is what I can read from the source code after a long days work, so please
read the code as 4 eyes is better that 2 ;)
>
>
>
> Med venlig hilsen
>  
> Claus Ibsen
> ......................................
> Silverbullet
> Skovsgårdsvænget 21
> 8362 Hørning
> Tlf. +45 2962 7576
> Web: www.silverbullet.dk
>
> -----Original Message-----
> From: Bart Frackiewicz [mailto:bart@open-medium.com] 
> Sent: 2. september 2008 17:40
> To: camel-user@activemq.apache.org
> Subject: Splitter for big files
>
> Hi,
>
> i am using this route for a couple of CSV file routes:
>
>    from("file:/tmp/input/?delete=true")
>    .splitter(body(InputStream.class).tokenize("\r\n"))
>    .beanRef("myBean", "process")
>    .to("file:/tmp/output/?append=true")
>
> This works fine for small CSV files, but for big files i noticed
> that camel uses a lot of memory, it seems that camel is reading
> the file into memory. What is the configuration to use a stream
> in the splitter?
>
> I recognized the same behaviour in the xpath splitter:
>
>    from("file:/tmp/input/?delete=true")
>    .splitter(ns.xpath("//member"))
>    ...
>
> BTW, i found a posting from march, where James suggest following
> implementation for an own splitter:
>
> -- quote --
>
>    from("file:///c:/temp?noop=true)").
>      splitter().method("myBean", "split").
>      to("activemq:someQueue")
>
> Then register "myBean" with a split method...
>
> class SomeBean {
>    public Iterator split(File file) {
>       /// figure out how to split this file into rows...
>    }
> }
> -- quote --
>
> But this won't work for me (Camel 1.4).
>
> Bart
>
>   


Mime
View raw message