hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Anderson <mik...@mit.edu>
Subject interface to HDFS
Date Thu, 21 May 2009 18:13:29 GMT
Hello, I'm working on a hadoop project where my data is comprised of many
HTML files (websites). One aspect of the project involves traditional
MapReduce analysis on the data set, but I would also like to use hadoop as a
sort of "cache server," i.e, having the ability to retrieve the HTML for a
website that I have already been to.

My question is this: what is the best way to interact with HDFS to make
simple existance queries and retrieve specific files for reading. Ideally I
would like to do this at an application level, (most likely written in
Ruby). So far I have explored the option of using one of the FUSE packages
to mount it in the userspace, but, I ran into quite a bit of difficulty
installing either of the two popular packages. My second option seems to be
Hive, but I haven't been able to find any bindings for Ruby or Python, etc.

Any suggestions or advice would be greatly appreciated!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message