hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uma Maheswara Rao G <mahesw...@huawei.com>
Subject RE: When to use DFSInputStream and HdfsDataInputStream
Date Tue, 01 Oct 2013 04:40:35 GMT
Hi Rob,

DFSInputStream:  InterfaceAudience for this class is private and  you should not use this
class directly. This class mainly implements actual core functionality of read. And this is
DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and you can use this class.
In fact, you will get the object of HdfsDataInputStream when you open the file for read. This
wrapper provides you some additional DFS specific api implementations like getVisibleLength
etc which are may not be the intended apis for normal FS.

Similar way for write:
I hope this will help you for clarifying your doubts.


From: Rob Blah [mailto:tmp5330@gmail.com]
Sent: 01 October 2013 03:39
To: user@hadoop.apache.org
Subject: When to use DFSInputStream and HdfsDataInputStream

What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have similar functionality,
only HdfsData*Stream "follows" Data*Stream instead of *Stream. Also is DFS*Stream more general
than HdfsData*Stream, in the sense it works on higher abstraction layer, can work with other
Distributed FS (even though it contact HDFS specific components), or its just naming convention?
Which one should I chose to read/write data from/to HDFS and why (sounds like academic question
;) )?

* -> means both Input and Output


View raw message