hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralph Soika <ralph.so...@imixs.com>
Subject Re: How to write a Job for importing Files from an external Rest API into Hadoop
Date Mon, 31 Jul 2017 20:23:25 GMT
Hi Ravi,

thanks a lot for your response and the code example!
I think this will help me a lot to get started .I am glad to see that my 
idea is not to exotic.
I will report if I can adapt the solution for my problem.

best regards
Ralph


On 31.07.2017 22:05, Ravi Prakash wrote:
> Hi Ralph!
>
> Although not totally similar to your use case, DistCp may be the 
> closest thing to what you want. 
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCp.java

> . The client builds a file list, and then submits an MR job to copy 
> over all the files.
>
> HTH
> Ravi
>
> On Sun, Jul 30, 2017 at 2:21 PM, Ralph Soika <ralph.soika@imixs.com 
> <mailto:ralph.soika@imixs.com>> wrote:
>
>     Hi,
>
>     I want to ask, what's the best way implementing a Job which is
>     importing files into the HDFS?
>
>     I have an external System offering data accessible through a Rest
>     API. My goal is to have a job running in Hadoop which is
>     periodical (maybe started by chron?) looking into the Rest API if
>     new data is available.
>
>     It would be nice if also this job could run on multiple data
>     nodes. But in difference to all the MapReduce examples I found, is
>     my job looking for new Data or changed data from an external
>     interface and compares the data with existing one.
>
>     This is a conceptual example of the job:
>
>      1. The job ask the Rest API if there are new files
>      2. if so, the job imports the first file in the list
>      3. look if the file already exits
>          1. if not, the job imports the file
>          2. if yes, the job compares the data with the data already stored
>              1. if changed the job updates the file
>      4. if more file exits the job continues with 2 -
>      5. otherwise ends.
>
>
>     Can anybody give me a little help how to start (its my first job I
>     write...) ?
>
>
>     ===
>     Ralph
>
>
>
>
>     -- 
>
>

-- 
*Imixs*...extends the way people work together
We are an open source company, read more at: www.imixs.org 
<http://www.imixs.org>
------------------------------------------------------------------------
Imixs Software Solutions GmbH
Agnes-Pockels-Bogen 1, 80992 M√ľnchen
*Web:* www.imixs.com <http://www.imixs.com>
*Office:* +49 (0)89-452136 16 *Mobil:* +49-177-4128245
Registergericht: Amtsgericht Muenchen, HRB 136045
Geschaeftsfuehrer: Gaby Heinle u. Ralph Soika


Mime
View raw message