camel-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josef Ludvíček (JIRA) <j...@apache.org>
Subject [jira] [Commented] (CAMEL-8040) camel-hdfs2 consumer overwriting data instead of appending them
Date Fri, 12 Dec 2014 15:10:13 GMT

    [ https://issues.apache.org/jira/browse/CAMEL-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244240#comment-14244240
] 

Josef Ludvíček commented on CAMEL-8040:
---------------------------------------

I totally missed option {{chunkSize}}. Thanks for explaining it.
Created improvement docs issue CAMEL-8150.

> camel-hdfs2 consumer overwriting data instead of appending them
> ---------------------------------------------------------------
>
>                 Key: CAMEL-8040
>                 URL: https://issues.apache.org/jira/browse/CAMEL-8040
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-hdfs
>    Affects Versions: 2.13.0, 2.14.0
>            Reporter: Josef Ludvíček
>            Assignee: Willem Jiang
>         Attachments: hdfs-reproducer.zip
>
>
> h1. camel-hdfs2 consumer overwriting data instead of appending them
> There is probably bug in camel hdfs2 consumer.
> In this project are two camel routes, one taking files from `test-source` and uploading
them to hadoop hdfs,
> another route watching folder in hadoop hdfs and downloading them to `test-dest` folder
in this project.
> It seems, that when downloading file from hdfs to local filesystem, it keeps writing
chunks of data to begining of target file in test-source, instead of simply appending chunks,
as I would expect.
> From camel log i suppose, that each chunk of data from hadoop file is treated it was
whole file.
> Ruby script `generate_textfile.rb` can generate file `test.txt` with content 
> {code}
> 0 - line
> 1 - line
> 2 - line
> 3 - line
> 4 - line
> 5 - line
> ...
> ...
> 99999 - line
> {code}
> h2. Scenario
>  - _expecting running hadoop instance on localhost:8020_
>  - run mvn camel:run
>  - copy test.txt into test-source
>  - see log and file test.txt in test-dest
>  - rest.txt in test-dest folder should contain only last x lines of original one.
>  
>  
> Camel log 
> {code}
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> {code}
>  
> h2. Envoriment
> * camel 2.14 and 2.13 
> * hadoop VirtualBox VM 
> * * downloaded from http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-2-x.html
> * * tested with version 2.3.0-cdh5.1.0, r8e266e052e423af592871e2dfe09d54c03f6a0e8 which
I couldn't find on download page
> * hadoop docker image
> * * https://github.com/sequenceiq/hadoop-docker
> * * results were the same as with virtualbox vm
> In case ov VirtualBox VM, by default it binds hdfs to `hdfs://quickstart.cloudera:8020`
and it needs to be changed in `/etc/hadoop/conf/core-site.xml`. It should work when `fs.defaultFS`
is set to `hdfs://0.0.0.0:8020`.
> In case of docker hadoop image, first start docker container, figure out its ip address,
and use it for camel hdfs component.
> Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`.
> {code} 
> docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash
> Starting sshd:                                             [  OK  ]
> Starting namenodes on [966476255fc2]
> 966476255fc2: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-966476255fc2.out
> localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-966476255fc2.out
> Starting secondary namenodes [0.0.0.0]
> 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-966476255fc2.out
> starting yarn daemons
> starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-966476255fc2.out
> localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-966476255fc2.out
> {code}
> see to which IP hdfs filesystem api is bound to inside docker container
> {code}
> bash-4.1# netstat -tulnp 
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address               Foreign Address             State   
   PID/Program name   
> ...
> tcp        0      0 172.17.0.2:9000             0.0.0.0:*                   LISTEN  
   -                   
> ...
> {code}
> There might be Exception because of hdfs permissions. It could be solved by setting hdfs
filesystem permissions.
> {code}
> bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message