hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanish Kushary <himan...@gmail.com>
Subject Re: Reducer throwing warning during join operations.Defaulting int columns to 0
Date Wed, 15 Aug 2012 17:20:27 GMT
Thanks Nitin..but to take care of that I had cleaned the csv files of
leading and trailing spaces before putting into hdfs.Also ran the dos2unix
command on the csv files.

Only if I define the external table with all fields data type as STRING the
joins perform properly.Even when load the data initially into a table with
all STRING fields and at a latter point copy the data to a different table
with proper data type, the joins give wrong result on the new table also.


On Wed, Aug 15, 2012 at 1:14 PM, Nitin Pawar <nitinpawar432@gmail.com>wrote:

> it might be the case that there are few empty spaces at the end of
> each row which are being handled when you are reading and writing from
> disc
>
> but when you set autoconvert then looks like one of these  tables is
> really small and it is converted into mapside join
> which means the entire table is loaded into map memory and there is no
> need of reduce
>
> On Wed, Aug 15, 2012 at 9:13 PM, Himanish Kushary <himanish@gmail.com>
> wrote:
> > Hi,
> >
> > I have uploaded few csv files from windows into hive and configured few
> > external tables using them. When I am trying to run a join on two tables
> one
> > of the int columns
> > get changed to 0. The structure of the tables are as follows:
> >
> >
> > Table-1                                        Table-2
> > ------------                                        -----------
> >
> > Id(int)                                          id(int)   datetime
> > eid(int)
> > --                                                  ----     ------------
> > -----
> > 1                                                    1   2011-02-01   3
> > 2                                                    1   2011-03-01   4
> > 3                                                    2   2011-04-01   5
> >                                                       4   2011-05-01   6
> >                                                       6   2011-06-01   7
> >
> >
> > The join query is - select a.* from Table-2 a join Table-1 b on (a.id=
> b.id);
> >
> > The output is:
> >
> > 1  2011-02-01   0
> > 1  2011-03-01   0
> > 2  2011-04-01   0
> >
> >
> > I checked the logs and noticed the following warning : WARN
> > org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct: Extra bytes
> > detected at the end of the row! Ignoring similar problems.Could this be
> > causing it ?
> >
> > When I turn on hive.auto.convert.join=true , the error goes away as
> there is
> > no reduce phase.The output is:
> >
> > 1  2011-02-01   3
> > 1  2011-03-01   4
> > 2  2011-04-01   5
> >
> > Could somebody please help me figure out why we get the wrong results
> when
> > running through the reducer.
> > --
> > Thanks
>
>
>
> --
> Nitin Pawar
>



-- 
Thanks & Regards
Himanish

Mime
View raw message