sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abraham Elmahrek <...@cloudera.com>
Subject Re: Sqoop2 inserts new line in output file
Date Wed, 08 Oct 2014 19:20:19 GMT
Hey there,

There's usually a control character or something that causes this behavior.

I don't see the source data hexdump attachment. Could you please reattach?

-Abe

On Wed, Oct 8, 2014 at 5:29 AM, shakun grover <s28sweet@gmail.com> wrote:

> Even when I view view this data in Hive, it takes
> 1,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com','
> www.google.com','110 Campus Dr. Berkeley CA 94111
> as first column of first row
> then
> ','San Jose','CA','94500','USA' as first column of second row.. Other
> columns are given value as NULL
>
> On Wed, Oct 8, 2014 at 4:38 PM, shakun grover <s28sweet@gmail.com> wrote:
>
> > Hi Abe,
> >
> > I have attached the sample data with this mail.
> >
> > This is the job that I created to import this data to HDFS:
> >
> > *Job:*
> > Name: testEmp
> >
> > Database configuration
> >
> > Schema name:
> > Table name:
> > Table SQL statement: select * from test.emp WHERE ${CONDITIONS}
> > Table column names:
> > Partition column name: id
> > Nulls in partition column:
> > Boundary query:
> >
> > Output configuration
> >
> > Storage type:
> >   0 : HDFS
> > Choose:
> > Output format:
> >   0 : TEXT_FILE
> >   1 : SEQUENCE_FILE
> > Choose: 0
> > Output directory: /tmp/emp/1
> >
> > Throttling resources
> >
> > Extractors:
> > Loaders:
> > Job was successfully updated with status FINE
> >
> > When I view the data with the below mentioned command:
> > *hadoop fs -cat /tmp/emp/p**
> > *It shows me data as follows:(*It inserts line break after 110 Campus Dr.
> > Berkeley CA 94111)
> >
> > 1,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com','
> > www.google.com','110 Campus Dr. Berkeley CA 94111
> > ','San Jose','CA','94500','USA'
> > 2,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com','
> > www.google.com','110 Campus Dr. Berkeley CA 94111
> > ','San Jose','CA','94500','USA'
> > 3,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com','
> > www.google.com','110 Campus Dr. Berkeley CA 94111
> > ','San Jose','CA','94500','USA'
> > 4,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com','
> > www.google.com','110 Campus Dr. Berkeley CA 94111
> > ','San Jose','CA','94500','USA'
> > 5,'James','A','Bond','557502533','(681) 675-8580','james@gmail.com','
> > www.google.com','110 Campus Dr. Berkeley CA 94111
> > ','San Jose','CA','94500','USA'
> >
> >
> >
> >
> > On Wed, Oct 8, 2014 at 1:03 AM, Abraham Elmahrek <abe@cloudera.com>
> wrote:
> >
> >> Could we take a peek at your data from its source as hex?
> >>
> >> -Abe
> >>
> >> On Tue, Oct 7, 2014 at 3:46 AM, shakun grover <s28sweet@gmail.com>
> wrote:
> >>
> >> > Yes, that's correct that Sqoop2 should insert new lines at the end of
> a
> >> > records.
> >> > But if that record has many columns say (>15) columns in a record,
> then
> >> > after few columns, it inserts a new line .
> >> >
> >> > Example:
> >> > 1,'346088103340400','3410 9240 5550
> >> > 778','3710-1690-2390-472','537436268','537 43 6268
> >> >
> >> > ','537-43-6268
> >> >
> >> > ','6816758580
> >> >
> >> > ','681 675 8580
> >> >
> >> > ','681-675-8580
> >> >
> >> > ','(681) 675-8580
> >> >
> >> > ','(681)675-8580
> >> >
> >> > ','1617547959','12.215.42.19
> >> >
> >> > ','','1132286141
> >> >
> >> > ','https://blu162.mail.live.com
> >> >
> >> > ','110 Campus Dr. Berkeley CA 94111
> >> >
> >> > ','James
> >> >
> >> > '
> >> > This is one record which got imported to HDFS in the above mentioned
> >> > format. After 6th column it inserted a new line and then after each
> >> column,
> >> > it inserted new line. Though this behavior of inserting new lines  is
> >> not
> >> > same in all the cases.
> >> > It inserts new lines randomly after nth column.
> >> >
> >> >
> >> > On Thu, Oct 2, 2014 at 1:12 AM, Abraham Elmahrek <abe@cloudera.com>
> >> wrote:
> >> >
> >> > > Hey there,
> >> > >
> >> > > Sqoop2 should insert new lines at the end of a record. In fact,
> Sqoop2
> >> > > should just write CSV. Could you copy/paste an example with Schema?
> >> > >
> >> > > -Abe
> >> > >
> >> > > On Tue, Sep 30, 2014 at 11:32 PM, shakun grover <s28sweet@gmail.com
> >
> >> > > wrote:
> >> > >
> >> > > > Hi All,
> >> > > >
> >> > > > When I import many columns(say >20 columns) from RDBMS to
HDFS,
> then
> >> > > Sqoop2
> >> > > > inserts a new line in the output file.The newline appears at
the
> >> end of
> >> > > > certain fields.Doesn't seem to appear for every single field.
> >> > > >
> >> > > > Can you please tell me why this new line is inserted? And is
there
> >> any
> >> > > way
> >> > > > to avoid this?
> >> > > >
> >> > > > Thanks in advance!!
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Thanks & Regards,
> >> > > > Shakun Grover
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards,
> >> > Shakun Grover
> >> >
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Shakun Grover
> >
>
>
>
> --
> Thanks & Regards,
> Shakun Grover
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message