camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Zhemzhitsky <>
Subject Re: HDFS2 Component and NORMAL_FILE type
Date Tue, 24 Mar 2015 20:19:50 GMT

Really interesting question.
The answer is this jira issue:
and this diff:

It would be really great if
1. the component will make this feature optional to be able to stream multigigabyte data from
within hdfs directly
on the file by file basis
2. the component will merge the files on the fly without any intermediate storage.

Just raised the JIRA:


> Hi, all!

> I'm looking at ways to use hdfs2 component to read files stored in a Hadoop
> directory. As a quite new Hadoop user I assume that simplest way is when
> data is stored in normal file format.

> I was looking at code in
> 'org.apache.camel.component.hdfs2.HdfsFileType#NORMAL_FILE' class that is
> responsible for creating the input stream and noticed that it will copy the
> whole file to the local file system (in temp file) before opening input
> stream (the case when using 'hdfs://' URI).

> I wonder what is the reason behind this? Isn't it possible that file can be
> very large and then this operation will be quite costly? Maybe I missing
> some basic restrictions on using normal files in Hadoop?

> Thanks in advance
> Alexey

Best regards,

View raw message