camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josef Ludvíček <>
Subject Re: HDFS2 Component and NORMAL_FILE type
Date Tue, 24 Mar 2015 20:37:33 GMT

related to hdfs2 and normal file, you might find,
that camel sends message per data chunk,
NOT message per file (which I would expect).

They probably don't intent to change it.

It was reported
as bug (won't fix)
and as doc enhancment 

Btw nice catch with that tmp file :)


On 03/24/2015 09:19 PM, Sergey Zhemzhitsky wrote:
> Hello,
> Really interesting question.
> The answer is this jira issue:
> and this diff:
> It would be really great if
> 1. the component will make this feature optional to be able to stream multigigabyte data
from within hdfs directly
> on the file by file basis
> 2. the component will merge the files on the fly without any intermediate storage.
> Just raised the JIRA:
> Regards,
> Sergey
>> Hi, all!
>> I'm looking at ways to use hdfs2 component to read files stored in a Hadoop
>> directory. As a quite new Hadoop user I assume that simplest way is when
>> data is stored in normal file format.
>> I was looking at code in
>> 'org.apache.camel.component.hdfs2.HdfsFileType#NORMAL_FILE' class that is
>> responsible for creating the input stream and noticed that it will copy the
>> whole file to the local file system (in temp file) before opening input
>> stream (the case when using 'hdfs://' URI).
>> I wonder what is the reason behind this? Isn't it possible that file can be
>> very large and then this operation will be quite costly? Maybe I missing
>> some basic restrictions on using normal files in Hadoop?
>> Thanks in advance
>> Alexey

View raw message