hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralph Soika <ralph.so...@imixs.com>
Subject How to write a Job for importing Files from an external Rest API into Hadoop
Date Sun, 30 Jul 2017 21:21:34 GMT

I want to ask, what's the best way implementing a Job which is importing 
files into the HDFS?

I have an external System offering data accessible through a Rest API. 
My goal is to have a job running in Hadoop which is periodical (maybe 
started by chron?) looking into the Rest API if new data is available.

It would be nice if also this job could run on multiple data nodes. But 
in difference to all the MapReduce examples I found, is my job looking 
for new Data or changed data from an external interface and compares the 
data with existing one.

This is a conceptual example of the job:

 1. The job ask the Rest API if there are new files
 2. if so, the job imports the first file in the list
 3. look if the file already exits
     1. if not, the job imports the file
     2. if yes, the job compares the data with the data already stored
         1. if changed the job updates the file
 4. if more file exits the job continues with 2 -
 5. otherwise ends.

Can anybody give me a little help how to start (its my first job I 
write...) ?



View raw message