commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dlmar...@comcast.net
Subject Re: [VFS] Implementing custom hdfs file system using commons-vfs 2.0
Date Mon, 28 Jul 2014 21:06:42 GMT

The HDFS file system implementation for VFS 2.1 is read-only. I did not implement any of the
write methods because I didn't need them at the time and there are some differences in writing
to Hadoop and a Filesystem as you pointed out. If you want to use released HDFS VFS FileObject's,
then you can use the one's I put into Accumulo (take a look at the 1.6.0 release). The objects
in Accumulo will be removed when VFS 2.1 is released. 

- Dave 

----- Original Message -----

From: "Bernd Eckenfels" <ecki@zusammenkunft.net> 
To: dev@commons.apache.org 
Cc: "Richards Peter" <hbkrichards@gmail.com> 
Sent: Monday, July 28, 2014 4:50:55 PM 
Subject: Re: [VFS] Implementing custom hdfs file system using commons-vfs 2.0 

Hello, 

yes by default VFS offers a Input/OutputStream based interface to the 
FileContent and a RandomAccess interface (which is specific to VFS). 

I think the current HDFS provider (VFS2.1) does support only those two 
(ReadOnly for the Random Access). 

I am not sure if you can wrap one of the two into a RCFile or if you 
can use that only with reah HDFS FileSystem objects (not familiar with 
Hadoop). 

There is the possibility to add extensions (operations). One possible 
extension would be to retrieve the underlying HDFS file (or a Object 
implementing the record based interface). 

That is certainly the way to go if you need that kind of access, 
however, if you want such specific HDFS access modes, I wonder if it 
isnt the best to use HDFS only/directly? What is the motivation for 
wrap it into VFS? 

BTW: there was some interest in VFS on the HDFS developer mailinglist a 
few weeks back. If you plan to do anything in that direction, you might 
involve them as well. 

I copy the commons-dev list, since I am not familiar with the HDFS 
provider and also it is a general discussion. 

Greetings 
Bernd 


Am 
Mon, 28 Jul 2014 15:57:57 +0530 schrieb Richards Peter 
<hbkrichards@gmail.com>: 

> Hi Bernd, 
> 
> I would like to clarify one more doubt. I found that commons-vfs is 
> implemented based on java.io.*. Commons-vfs returns 
> java.io.InputStream/java.io.OutputStream for reading/writing files. 
> 
> I have a use case to read/write files from/to hdfs. These files may be 
> txt(csv) or RCFiles(Record Columnar Files, using hive apis). Handling 
> txt files is straight forward. I can wrap the InputStream and 
> OutputStream to some reader and read the contents. 
> 
> However for RCFiles I have to use: 
> https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/io/RCFile.Writer.html

> https://hive.apache.org/javadocs/r0.10.0/api/org/apache/hadoop/hive/ql/io/RCFile.Reader.html

> 
> In these classes, the methods exposed to write and read contents are 
> not based on java input and output streams, but append() and 
> getCurrentRow() apis, both of which requires BytesRefArrayWritable 
> objects. 
> 
> I think my use case is more related to the file content format, 
> reader and writer. What would you recommend me in this scenario to 
> read from and write to such files? Should I just hold the file object 
> implementation reference in my own reader and writer classes and 
> create RC File reader and writer instances within those classes? Can 
> something else be done using commons-vfs to read from and write to 
> files irrespective of the contents(eg: FileContent, FileContentInfo 
> and FileContentInfoFactory)? 
> 
> Thanks, 
> Richards Peter. 
> 
> 
> On Mon, Jul 28, 2014 at 12:36 PM, Richards Peter 
> <hbkrichards@gmail.com> wrote: 
> 
> > Hi Bernd, 
> > 
> > Thanks for your response. 
> > 
> > Our company does not allow the development team to use 
> > candidate/snapshot releases of open-source projects. That is the 
> > reason why I am checking about vfs 2.0 version. 
> > 
> > I am checking the code available in: 
> > 
> > http://svn.apache.org/viewvc/commons/proper/vfs/trunk/core/src/main/java/org/apache/commons/vfs2/provider/hdfs/

> > and 
> > 
> > https://github.com/pentaho/pentaho-hdfs-vfs/tree/master/src/org/pentaho/hdfs/vfs

> > 
> > I would also like to check whether it is fine if I try to clarify my 
> > doubts with you through this mail thread if I face any problems 
> > while implementing hdfs file system for vfs-2.0. I will also check 
> > vfs-2.1 and see whether I can contribute to that as well. 
> > 
> > Regards, 
> > Richards Peter. 
> > 
> > 
> > On Mon, Jul 28, 2014 at 1:17 AM, Bernd Eckenfels 
> > <ecki@zusammenkunft.net> wrote: 
> > 
> >> Hello Peter, 
> >> 
> >> I would suggest you use the 
> >> current Version from CVS or the snapshot builds. This would have 
> >> the big advantage that you can actually test and contribute to 
> >> this version in case you miss some features or find some bugs. 
> >> 
> >> If you want to implement your own file system provider, you 
> >> typically start to copy one of the existing providers and adopt 
> >> it. The main work is one in Implementing a specific FileObject 
> >> which extends from AbstractFileSystemObject and implements all the 
> >> various doSomething() methods. 
> >> 
> >> Actually the JavaDoc of that Abstract Object is quite good in this 
> >> regard. 
> >> 
> >> After you have implemented the new Filesystem, it will be 
> >> available for addProvider() or you can add it as a new provider to 
> >> the xml configuration of StandardFileSystemManager like described 
> >> here: http://commons.apache.org/proper/commons-vfs/api.html 
> >> 
> >> Greetings 
> >> Bernd 
> >> 
> >> Am Sun, 27 Jul 2014 15:24:57 +0530 
> >> schrieb Richards Peter <hbkrichards@gmail.com>: 
> >> 
> >> > Hi, 
> >> > 
> >> > I am evaluating commons-vfs 2.0 for one of my use cases. I read 
> >> > that commons--vfs 2.1 has a file system implementation for HDFS. 
> >> > Since commons-vfs 2.1 is still in development and commons-vfs 
> >> > 2.1 does not have all capabilities that we require for hdfs, I 
> >> > would like to implement a custom file system with commons vfs 
> >> > 2.0 now and enchance commons-vfs 2.1 when that release is made. 
> >> > 
> >> > Could you please tell me how to implement a such a file system 
> >> > for commons-vfs 2.0? I would like to know: 
> >> > 1. The specific classes that need to be implemented. 
> >> > 2. How to register/supply these classes so that it can be used 
> >> > by my application? 
> >> > 3. How the name resolution takes place when I provide a filepath 
> >> > of hdfs file? 
> >> > 
> >> > Thanks, 
> >> > Richards Peter. 
> >> 
> >> 
> > 
> 

--------------------------------------------------------------------- 
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org 
For additional commands, e-mail: dev-help@commons.apache.org 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message