hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: suggest Best way to upload xml files to HDFS
Date Fri, 13 Jul 2012 06:14:15 GMT
If you're looking at automated file/record/event collection, take a
look at Apache Flume: http://incubator.apache.org/flume/. It does well
for distributed collections as well and is very configurable.

Otherwise, write a scheduled script to do the uploads every X period
(your choice). Consider using
https://github.com/edwardcapriolo/filecrush or similar tools too, if
your files are much small and getting in the way of MR processing.

On Fri, Jul 13, 2012 at 8:59 AM, Manoj Babu <manoj444@gmail.com> wrote:
> Hi,
> I need to upload large xml files files daily. Right now am having a small
> program to read all the files from local folder and writing it to HDFS as a
> single file. Is this a right way?
> If there any best practices or optimized way to achieve this Kindly let me
> know.
> Thanks in advance!
> Cheers!
> Manoj.

Harsh J

View raw message