Return-Path: X-Original-To: apmail-camel-issues-archive@minotaur.apache.org Delivered-To: apmail-camel-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 43B98F569 for ; Fri, 12 Dec 2014 15:10:14 +0000 (UTC) Received: (qmail 8420 invoked by uid 500); 12 Dec 2014 15:10:14 -0000 Delivered-To: apmail-camel-issues-archive@camel.apache.org Received: (qmail 8108 invoked by uid 500); 12 Dec 2014 15:10:14 -0000 Mailing-List: contact issues-help@camel.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@camel.apache.org Delivered-To: mailing list issues@camel.apache.org Received: (qmail 8097 invoked by uid 99); 12 Dec 2014 15:10:14 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Dec 2014 15:10:14 +0000 Date: Fri, 12 Dec 2014 15:10:13 +0000 (UTC) From: =?utf-8?Q?Josef_Ludv=C3=AD=C4=8Dek_=28JIRA=29?= To: issues@camel.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CAMEL-8040) camel-hdfs2 consumer overwriting data instead of appending them MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CAMEL-8040?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1424= 4240#comment-14244240 ]=20 Josef Ludv=C3=AD=C4=8Dek commented on CAMEL-8040: --------------------------------------- I totally missed option {{chunkSize}}. Thanks for explaining it. Created improvement docs issue CAMEL-8150. > camel-hdfs2 consumer overwriting data instead of appending them > --------------------------------------------------------------- > > Key: CAMEL-8040 > URL: https://issues.apache.org/jira/browse/CAMEL-8040 > Project: Camel > Issue Type: Bug > Components: camel-hdfs > Affects Versions: 2.13.0, 2.14.0 > Reporter: Josef Ludv=C3=AD=C4=8Dek > Assignee: Willem Jiang > Attachments: hdfs-reproducer.zip > > > h1. camel-hdfs2 consumer overwriting data instead of appending them > There is probably bug in camel hdfs2 consumer. > In this project are two camel routes, one taking files from `test-source`= and uploading them to hadoop hdfs, > another route watching folder in hadoop hdfs and downloading them to `tes= t-dest` folder in this project. > It seems, that when downloading file from hdfs to local filesystem, it ke= eps writing chunks of data to begining of target file in test-source, inste= ad of simply appending chunks, as I would expect. > From camel log i suppose, that each chunk of data from hadoop file is tre= ated it was whole file. > Ruby script `generate_textfile.rb` can generate file `test.txt` with cont= ent=20 > {code} > 0 - line > 1 - line > 2 - line > 3 - line > 4 - line > 5 - line > ... > ... > 99999 - line > {code} > h2. Scenario > - _expecting running hadoop instance on localhost:8020_ > - run mvn camel:run > - copy test.txt into test-source > - see log and file test.txt in test-dest > - rest.txt in test-dest folder should contain only last x lines of origi= nal one. > =20 > =20 > Camel log=20 > {code} > [localhost:8020/tmp/camel-test/] toFile INFO picked = up file from hdfs with name test.txt > [localhost:8020/tmp/camel-test/] toFile INFO file do= wnloaded from hadoop > [localhost:8020/tmp/camel-test/] toFile INFO picked = up file from hdfs with name test.txt > [localhost:8020/tmp/camel-test/] toFile INFO file do= wnloaded from hadoop > [localhost:8020/tmp/camel-test/] toFile INFO picked = up file from hdfs with name test.txt > [localhost:8020/tmp/camel-test/] toFile INFO file do= wnloaded from hadoop > [localhost:8020/tmp/camel-test/] toFile INFO picked = up file from hdfs with name test.txt > [localhost:8020/tmp/camel-test/] toFile INFO file do= wnloaded from hadoop > {code} > =20 > h2. Envoriment > * camel 2.14 and 2.13=20 > * hadoop VirtualBox VM=20 > * * downloaded from http://www.cloudera.com/content/cloudera/en/downloads= /quickstart_vms/cdh-5-2-x.html > * * tested with version 2.3.0-cdh5.1.0, r8e266e052e423af592871e2dfe09d54c= 03f6a0e8 which I couldn't find on download page > * hadoop docker image > * * https://github.com/sequenceiq/hadoop-docker > * * results were the same as with virtualbox vm > In case ov VirtualBox VM, by default it binds hdfs to `hdfs://quickstart.= cloudera:8020` and it needs to be changed in `/etc/hadoop/conf/core-site.xm= l`. It should work when `fs.defaultFS` is set to `hdfs://0.0.0.0:8020`. > In case of docker hadoop image, first start docker container, figure out = its ip address, and use it for camel hdfs component. > Here camel uri would be `hdfs:172.17.0.2:9000/tmp/camel-test`. > {code}=20 > docker run -i -t sequenceiq/hadoop-docker:2.5.1 /etc/bootstrap.sh -bash > Starting sshd: [ OK ] > Starting namenodes on [966476255fc2] > 966476255fc2: starting namenode, logging to /usr/local/hadoop/logs/hadoop= -root-namenode-966476255fc2.out > localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-ro= ot-datanode-966476255fc2.out > Starting secondary namenodes [0.0.0.0] > 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/ha= doop-root-secondarynamenode-966476255fc2.out > starting yarn daemons > starting resourcemanager, logging to /usr/local/hadoop/logs/yarn--resourc= emanager-966476255fc2.out > localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-r= oot-nodemanager-966476255fc2.out > {code} > see to which IP hdfs filesystem api is bound to inside docker container > {code} > bash-4.1# netstat -tulnp=20 > Active Internet connections (only servers) > Proto Recv-Q Send-Q Local Address Foreign Address = State PID/Program name =20 > ... > tcp 0 0 172.17.0.2:9000 0.0.0.0:* = LISTEN - =20 > ... > {code} > There might be Exception because of hdfs permissions. It could be solved = by setting hdfs filesystem permissions. > {code} > bash-4.1# /usr/local/hadoop/bin/hdfs dfs -chmod 777 / > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)