camel-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josef Ludvíček (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (CAMEL-8040) camel-hdfs2 consumer overwriting data instead of appending them
Date Thu, 11 Dec 2014 14:10:13 GMT

    [ https://issues.apache.org/jira/browse/CAMEL-8040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14242546#comment-14242546
] 

Josef Ludvíček edited comment on CAMEL-8040 at 12/11/14 2:09 PM:
-----------------------------------------------------------------

Hi Williem, 

yeah, but the docs says that "Override, which is the default, replaces the existing *file*."
But from what I see, it is replacing *chunks of that file* so in the end, I don't even have
valid file, just last data chunk of original file from hadoop.
If it was picture, it would be corrupted.

It looks like camel handles data chunk (with size of bufferSize - default 4096) as it was
the whole file.



was (Author: ludvicekj):
Hi Williem, 

yeah, but the docs says that "Override, which is the default, replaces the existing *file*."
But from what I see, it is replacing *chunks of that file* so in the end, I don't even have
valid file, just last data chunk of original file from hadoop.
If it was picture, it would be corrupted.




> camel-hdfs2 consumer overwriting data instead of appending them
> ---------------------------------------------------------------
>
>                 Key: CAMEL-8040
>                 URL: https://issues.apache.org/jira/browse/CAMEL-8040
>             Project: Camel
>          Issue Type: Bug
>          Components: camel-hdfs
>    Affects Versions: 2.13.0, 2.14.0
>            Reporter: Josef Ludvíček
>            Assignee: Willem Jiang
>         Attachments: hdfs-reproducer.zip
>
>
> h1. camel-hdfs2 consumer overwriting data instead of appending them
> There is probably bug in camel hdfs2 consumer.
> In this project are two camel routes, one taking files from `test-source` and uploading
them to hadoop hdfs,
> another route watching folder in hadoop hdfs and downloading them to `test-dest` folder
in this project.
> It seems, that when downloading file from hdfs to local filesystem, it keeps writing
chunks of data to begining of target file in test-source, instead of simply appending chunks,
as I would expect.
> From camel log i suppose, that each chunk of data from hadoop file is treated it was
whole file.
> Ruby script `generate_textfile.rb` can generate file `test.txt` with content 
> {code}
> 0 - line
> 1 - line
> 2 - line
> 3 - line
> 4 - line
> 5 - line
> ...
> ...
> 99999 - line
> {code}
> h2. Scenario
>  - _expecting running hadoop instance on localhost:8020_
>  - run mvn camel:run
>  - copy test.txt into test-source
>  - see log and file test.txt in test-dest
>  - rest.txt in test-dest folder should contain only last x lines of original one.
>  
>  
> Camel log 
> {code}
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> [localhost:8020/tmp/camel-test/] toFile                     INFO  picked up file from
hdfs with name test.txt
> [localhost:8020/tmp/camel-test/] toFile                     INFO  file downloaded from
hadoop
> {code}
>  
> h2. Envoriment
> * camel 2.14 and 2.13 
> * hadoop VirtualBox VM 
> * * downloaded from http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms/cdh-5-2-x.html
> * * tested with version 2.3.0-cdh5.1.0, r8e266e052e423af592871e2dfe09d54c03f6a0e8 which
I couldn't find on download page
> * hadoop docker image
> * * https://github.com/sequenceiq/hadoop-docker
> * * results were the same as with virtualbox vm
> In case ov VirtualBox VM, by default it binds hdfs to `hdfs://quickstart.cloudera:8020`
and it needs to be changed in `/etc/hadoop/conf/core-site.xml`. It should work when `fs.defaultFS`
is set to `hdfs://0.0.0.0:8020`.
> In case of docker hadoop image, first start docker container, figure out its ip address,
and use it for camel hdfs component.
> Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`.
> {code} 
> docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash
> Starting sshd:                                             [  OK  ]
> Starting namenodes on [966476255fc2]
> 966476255fc2: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-966476255fc2.out
> localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-966476255fc2.out
> Starting secondary namenodes [0.0.0.0]
> 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-966476255fc2.out
> starting yarn daemons
> starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourcemanager-966476255fc2.out
> localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-966476255fc2.out
> {code}
> see to which IP hdfs filesystem api is bound to inside docker container
> {code}
> bash-4.1# netstat -tulnp 
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address               Foreign Address             State   
   PID/Program name   
> ...
> tcp        0      0 172.17.0.2:9000             0.0.0.0:*                   LISTEN  
   -                   
> ...
> {code}
> There might be Exception because of hdfs permissions. It could be solved by setting hdfs
filesystem permissions.
> {code}
> bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message