hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Metadata, Daemons, Benchmarks, Web UI
Date Thu, 25 Feb 2010 17:56:54 GMT
On Wed, Feb 24, 2010 at 10:39 PM, Susheel Varma <susheel.varma@gmail.com>wrote:

> Hi,
>
> We are trying evaluate a small set of distributed data management
> solutions(iRODS, HDFS, Lustre) for our project. We don't really have a
> need for scalable computation, but rather our focus is more on
> redundancy, reliability and security. Although small bits computation
> would be needed at some level. We have just begun an evaluation of
> HDFS, however this has thrown up a few questions(a good thing, I
> guess):
>
> 1. Metadata & Links
> a. Is there a way to add/update/remove metadata held on the NameNode?
> Examples?
> b. Is there a way to get hold of the metadata held on the NameNode?
> I'd like to allow users to search the HDFS using the the metadata, and
> then resolve the query to actual link to the file.
>
> If a or b is not possible, I would have to resort to using an HBase to

store the custom metadata. Examples?
>

Can you please explain what you mean by metadata? Do you mean arbitrary
attributes on files? (like extended attributes in linux?)


>
> 2. Daemons
> a. Is there a way to setup a daemon job(map-reduce, or otherwise) to
> listen for filesystem events and trigger actions that need to be
> performed on these files? Examples?
> b. If not, Is there an FSEvent API I could use? Examples?
>

Nope. I think someone may have opened a JIRA for an inotify-like API, but
this does not exist currently. Polling is the best bet. Alternatively, you
could run a daemon on the same host as the NN which tails the audit log (or
write a log4j appender for the audit log) to watch for events on the trigger
files. This would be a nice contribution.


>
> 3. Benchmarks
> a. Are there any un/published HDFS filesystem benchmarks using IOZone,
> PostMark etc. I know almost all FS benchmarks must be taken with a
> pinch of salt, but I'd really like to see the quantitative comparisons
> with Lustre for example.
>
>
I don't think so - most of the existing benchmark tools require a POSIX
filesystem (eg they test random write). There was a thread some time back
about running iozone (minus the random write benchmarks) on HDFS, but I'm
not sure what the result was.

My hunch is that you will find it is competitive for sequential workloads
and less competitive for random workloads. That said, I know Brian Bockelman
(pretty active on this list) uses HDFS heavily with a primarily random-read
workload on a huge scientific dataset with good results.


> 4. Web UI
> a. Is there a simple way to augment the NameNode Web UI to allow users
> to login, search and download the files on the backend filesystem?
> Examples?
>

The existing NN UI has a "browse the filesystem" link. From there you can
navigate to a file and click "Download this file".

Search is not provided, as there is no efficient indexing and an O(number of
files) walk of the filesystem is prohibitively expensive for large NNs.


> b. If not, could you show me examples where users have combined a web
> application to serve files store on the HDFS?
>
> Thanks
> Susheel
>

-Todd

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message