hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gibbon, Robert, VF-Group" <Robert.Gib...@vodafone.com>
Subject RE: interfaces to HDFS icw Kerberos
Date Tue, 30 Nov 2010 12:33:04 GMT
Hi Evert,

We use the WebDAV tool integrated with HUE for authenticated ad-hoc
read/write access but for full throttle inbound data, our implementation
uses Flume.

With that said, the WebDAV solution is horizontally scalable  - it is a
stateless web app - so a software or hardware loadbalancer could be your
friend here to get the throughput up.

FTP is a session based protocol - and I have not looked in detail at the
HDFS-FTP implementation so it might not be so easy to scale it sideways.


-----Original Message-----
From: Evert Lammerts [mailto:Evert.Lammerts@sara.nl] 
Sent: Samstag, 27. November 2010 12:29
To: Vinithra Varadharajan; general@hadoop.apache.org
Cc: hdfs-user@hadoop.apache.org; Hue-Users
Subject: RE: interfaces to HDFS icw Kerberos

Hi Vinithra, others,

We are using CDH3b3 (which works amazingly well!!). And it's nice see
Y!'s Kerberos solution coupled to HDFS. But I wouldn't use Hue to upload
a set of files accumulating to 100's of GBs or a number of TB's.
Browser-based applications are not suitable for that, in my experience.
Do you have different experiences with Hue? (To be fair, we haven't
tested its performance yet.)

We are setting up a cluster that will be shared by people from a number
of different institutes, all working on different cases with different
data. Their work and data should be protected, also from each other. At
the same time they need to be able to transfer their data onto HDFS
(with a high enough throughput) from their local clusters / machines. Is
there a standard that others are using and that works for shared
clusters? How are Y! people getting their data onto HDFS?

Right now we are using SFTP. We handle authentication a bit 'hacky', but
it works: we've coupled our LDAP server to Hue through an Auth*Handler,
which also allows for executing a script that updates authentication
tokens for our FTP. So far the throughput is far from high enough though
- 1.5 MB/s - with data going over the line unencrypted. Unless we can
get that up significantly, while providing the option to encrypt the
data on the wire, that will probably not be a long term solution.

If anybody can share experiences on transparently and securely getting
data onto HDFS from external locations, that would be much appreciated!


From: Vinithra Varadharajan [vinithra@cloudera.com]
Sent: Friday, November 26, 2010 10:12 PM
To: general@hadoop.apache.org; Evert Lammerts
Cc: hdfs-user@hadoop.apache.org; Hue-Users
Subject: Re: interfaces to HDFS icw Kerberos

+ Hue-user mailing list

Hi Evert,

Which version of Hue and CDH are you using? CDH3b3 includes Yahoo's
security patches, which provide Kerberos authentication. In CDH3b3, we
have made changes to Hue's filebrowser application, which provides an
interface to upload data into HDFS, so that it works with Hadoop's
authentication. Is this similar to what you're looking for?


On Thu, Nov 25, 2010 at 11:16 AM, Evert Lammerts
<Evert.Lammerts@sara.nl<mailto:Evert.Lammerts@sara.nl>> wrote:
Hi list,

We're considering to provide our users with FTP and WebDAV interfaces
(with software provided here: http://www.hadoop.iponweb.net/). These
both support user accounts, so we'll be able to deal with
authentication. We're evaluating Cloudera's Hue, which we have coupled
to our LDAP service for authentication, as an interface to MapReduce.

These solutions are not the most beautiful in terms of authentication.
We'd much prefer to use Kerberos as provided by Y!. But if we do so, how
will we enable users to get data from the outside world onto HDFS? How
do others provide secure but easy interfaces to HDFS?

Kind regards,
Evert Lammerts

View raw message