Return-Path: X-Original-To: apmail-sqoop-dev-archive@www.apache.org Delivered-To: apmail-sqoop-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F042177BC for ; Wed, 8 Oct 2014 19:21:05 +0000 (UTC) Received: (qmail 78619 invoked by uid 500); 8 Oct 2014 19:21:04 -0000 Delivered-To: apmail-sqoop-dev-archive@sqoop.apache.org Received: (qmail 78579 invoked by uid 500); 8 Oct 2014 19:21:04 -0000 Mailing-List: contact dev-help@sqoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@sqoop.apache.org Delivered-To: mailing list dev@sqoop.apache.org Received: (qmail 78567 invoked by uid 99); 8 Oct 2014 19:21:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Oct 2014 19:21:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of abe@cloudera.com designates 209.85.218.46 as permitted sender) Received: from [209.85.218.46] (HELO mail-oi0-f46.google.com) (209.85.218.46) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Oct 2014 19:21:00 +0000 Received: by mail-oi0-f46.google.com with SMTP id h136so8650719oig.5 for ; Wed, 08 Oct 2014 12:20:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=bCAYvMdaH5Dn43qOXpKSb9aaR30F95R9YdluP/wrVo0=; b=IiUD+iDfou1Zuw9DywZQFZv7SLcdD6+puDopLUwHVAvLo8QvGrmM716jbos0z6Ge8x vs26O0l96MTPg5EH3v0W8ecpdCR7irAUBilY+BYJWt0KT8zBtNqrmEmvcjymLbce5Mf5 nVRiuOzwifTCt2E0zE1a5LjkoINO+0JDf9cZdDwnguIsCcXUfIOiSyIe1r99mpSt85r9 DQV/I+VFnSr6fe5ej2w5CS8IpYeweV38iSAhoFEP/UaZCa00nyXVtv6QPUe0WArlOMce s9K9h36S/mnbrPFvnJuSK6Yc6CPgK8tL/5PpKxuRXlL3xahvsiCN84tUW9LAsZ4VKlaQ FX4g== X-Gm-Message-State: ALoCoQkkOXLAACmwKEsE10DPjze36aO2QbjdGWuebQhaN2w/Q62iXGc+IMTzH5PkcdB/bZ+7MnuP X-Received: by 10.182.229.34 with SMTP id sn2mr15135712obc.69.1412796039673; Wed, 08 Oct 2014 12:20:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.76.154.230 with HTTP; Wed, 8 Oct 2014 12:20:19 -0700 (PDT) In-Reply-To: References: From: Abraham Elmahrek Date: Wed, 8 Oct 2014 12:20:19 -0700 Message-ID: Subject: Re: Sqoop2 inserts new line in output file To: "dev@sqoop.apache.org" Content-Type: multipart/alternative; boundary=001a1134975015d17e0504ee3680 X-Virus-Checked: Checked by ClamAV on apache.org --001a1134975015d17e0504ee3680 Content-Type: text/plain; charset=UTF-8 Hey there, There's usually a control character or something that causes this behavior. I don't see the source data hexdump attachment. Could you please reattach? -Abe On Wed, Oct 8, 2014 at 5:29 AM, shakun grover wrote: > Even when I view view this data in Hive, it takes > 1,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com',' > www.google.com','110 Campus Dr. Berkeley CA 94111 > as first column of first row > then > ','San Jose','CA','94500','USA' as first column of second row.. Other > columns are given value as NULL > > On Wed, Oct 8, 2014 at 4:38 PM, shakun grover wrote: > > > Hi Abe, > > > > I have attached the sample data with this mail. > > > > This is the job that I created to import this data to HDFS: > > > > *Job:* > > Name: testEmp > > > > Database configuration > > > > Schema name: > > Table name: > > Table SQL statement: select * from test.emp WHERE ${CONDITIONS} > > Table column names: > > Partition column name: id > > Nulls in partition column: > > Boundary query: > > > > Output configuration > > > > Storage type: > > 0 : HDFS > > Choose: > > Output format: > > 0 : TEXT_FILE > > 1 : SEQUENCE_FILE > > Choose: 0 > > Output directory: /tmp/emp/1 > > > > Throttling resources > > > > Extractors: > > Loaders: > > Job was successfully updated with status FINE > > > > When I view the data with the below mentioned command: > > *hadoop fs -cat /tmp/emp/p** > > *It shows me data as follows:(*It inserts line break after 110 Campus Dr. > > Berkeley CA 94111) > > > > 1,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com',' > > www.google.com','110 Campus Dr. Berkeley CA 94111 > > ','San Jose','CA','94500','USA' > > 2,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com',' > > www.google.com','110 Campus Dr. Berkeley CA 94111 > > ','San Jose','CA','94500','USA' > > 3,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com',' > > www.google.com','110 Campus Dr. Berkeley CA 94111 > > ','San Jose','CA','94500','USA' > > 4,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com',' > > www.google.com','110 Campus Dr. Berkeley CA 94111 > > ','San Jose','CA','94500','USA' > > 5,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com',' > > www.google.com','110 Campus Dr. Berkeley CA 94111 > > ','San Jose','CA','94500','USA' > > > > > > > > > > On Wed, Oct 8, 2014 at 1:03 AM, Abraham Elmahrek > wrote: > > > >> Could we take a peek at your data from its source as hex? > >> > >> -Abe > >> > >> On Tue, Oct 7, 2014 at 3:46 AM, shakun grover > wrote: > >> > >> > Yes, that's correct that Sqoop2 should insert new lines at the end of > a > >> > records. > >> > But if that record has many columns say (>15) columns in a record, > then > >> > after few columns, it inserts a new line . > >> > > >> > Example: > >> > 1,'346088103340400','3410 9240 5550 > >> > 778','3710-1690-2390-472','537436268','537 43 6268 > >> > > >> > ','537-43-6268 > >> > > >> > ','6816758580 > >> > > >> > ','681 675 8580 > >> > > >> > ','681-675-8580 > >> > > >> > ','(681) 675-8580 > >> > > >> > ','(681)675-8580 > >> > > >> > ','1617547959','12.215.42.19 > >> > > >> > ','','1132286141 > >> > > >> > ','https://blu162.mail.live.com > >> > > >> > ','110 Campus Dr. Berkeley CA 94111 > >> > > >> > ','James > >> > > >> > ' > >> > This is one record which got imported to HDFS in the above mentioned > >> > format. After 6th column it inserted a new line and then after each > >> column, > >> > it inserted new line. Though this behavior of inserting new lines is > >> not > >> > same in all the cases. > >> > It inserts new lines randomly after nth column. > >> > > >> > > >> > On Thu, Oct 2, 2014 at 1:12 AM, Abraham Elmahrek > >> wrote: > >> > > >> > > Hey there, > >> > > > >> > > Sqoop2 should insert new lines at the end of a record. In fact, > Sqoop2 > >> > > should just write CSV. Could you copy/paste an example with Schema? > >> > > > >> > > -Abe > >> > > > >> > > On Tue, Sep 30, 2014 at 11:32 PM, shakun grover > > >> > > wrote: > >> > > > >> > > > Hi All, > >> > > > > >> > > > When I import many columns(say >20 columns) from RDBMS to HDFS, > then > >> > > Sqoop2 > >> > > > inserts a new line in the output file.The newline appears at the > >> end of > >> > > > certain fields.Doesn't seem to appear for every single field. > >> > > > > >> > > > Can you please tell me why this new line is inserted? And is there > >> any > >> > > way > >> > > > to avoid this? > >> > > > > >> > > > Thanks in advance!! > >> > > > > >> > > > > >> > > > -- > >> > > > Thanks & Regards, > >> > > > Shakun Grover > >> > > > > >> > > > >> > > >> > > >> > > >> > -- > >> > Thanks & Regards, > >> > Shakun Grover > >> > > >> > > > > > > > > -- > > Thanks & Regards, > > Shakun Grover > > > > > > -- > Thanks & Regards, > Shakun Grover > --001a1134975015d17e0504ee3680--