hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Ferguson <j...@besquared.net>
Subject Re: Trouble Loading Into External Table
Date Wed, 26 Nov 2008 20:33:47 GMT
This is the describe extended for the properly working table

hive> DESCRIBE EXTENDED basic;
OK
actor_id        int
actee_id        int
properties      map<string,string>
Detailed Table Information:
Table(tableName:basic,dbName:default,owner:josh,createTime: 
1227688761,lastAccessTime:0,retention:0,sd:StorageDescriptor(cols: 
[FieldSchema(name:actor_id,type:int,comment:null), FieldSchema 
(name:actee_id,type:int,comment:null), FieldSchema 
(name:properties,type:map<string,string>,comment:null)],location:/ 
data/ 
sample2,inputFormat:org.apache.hadoop.mapred.TextInputFormat,outputForma 
t:org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat,compressed:fals 
e,numBuckets:-1,serdeInfo:SerDeInfo 
(name:null,serializationLib:org.apache.hadoop.hive.serde2.dynamic_type.D 
ynamicSerDe,parameters: 
{colelction.delim=44,mapkey.delim=58,serialization.format=org.apache.had 
oop.hive.serde2.thrift.TCTLSeparatedProtocol}),bucketCols:[],sortCols: 
[],parameters:{}),partitionKeys:[],parameters:{EXTERNAL=TRUE})

Josh Ferguson

On Nov 26, 2008, at 11:25 AM, Pete Wyckoff wrote:

>
> Can you send the new output of describe extended?
>
> Thanks, pete
>
>
> On 11/26/08 6:46 AM, "Ashish Thusoo" <athusoo@facebook.com> wrote:
>
> congrats!! :)
>
> please do file a JIRA for this. We will fix this as soon as possible..
>
> Ashish
>
> ________________________________________
> From: Josh Ferguson [josh@besquared.net]
> Sent: Wednesday, November 26, 2008 12:44 AM
> To: hive-user@hadoop.apache.org
> Subject: Re: Trouble Loading Into External Table
>
> I got it! The combination I got to work was the following:
>
> CREATE EXTERNAL TABLE basic
> (actor_id INT, actee_id INT, properties MAP<STRING, STRING>)
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '44'
> MAP KEYS TERMINATED BY '58'
> STORED AS TEXTFILE
> LOCATION '/data/sample2';
>
> This forced a DynamicSerde/TCTLSeperatedProtocol combo which didn't
> work properly when I specified the field delimiter but works fine
> when you only specify the other two for some reason and leave the
> field delimiter as the default, I should probably file it in JIRA.
>
> Josh Ferguson
>
> On Nov 25, 2008, at 7:09 PM, Joydeep Sen Sarma wrote:
>
> >
> > Can you please send the output of 'describe extended
> > activity_test'. This will help us understand what's happening with
> > all the create table parameters.
> >
> > Also - as a sanity check - can you please check hadoop dfs -cat  /
> > data/sample/* (to make sure data got loaded/moved into that dir)
> >
> > -----Original Message-----
> > From: Josh Ferguson [mailto:josh@besquared.net]
> > Sent: Tuesday, November 25, 2008 7:03 PM
> > To: hive-user@hadoop.apache.org
> > Subject: Re: Trouble Loading Into External Table
> >
> > hive> CREATE EXTERNAL TABLE activity_test
> >> (occurred_at INT, actor_id INT, actee_id INT, properties
> > MAP<STRING, STRING>)
> >> ROW FORMAT DELIMITED
> >> FIELDS TERMINATED BY '124'
> >> COLLECTION ITEMS TERMINATED BY '44'
> >> MAP KEYS TERMINATED BY '58'
> >> LINES TERMINATED BY '10'
> >> STORED AS TEXTFILE
> >> LOCATION '/data/sample';
> > OK
> >
> > hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO  
> TABLE
> > activity_test;
> > Copying data from file:/Users/josh/Hive/sample.tab
> > Loading data to table activity_test
> > OK
> >
> > $ hadoop fs -cat /data/sample/sample.tab
> > 1227422134|2|1|paid:44519,tax:2120,value:42399
> >
> > hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output2'
> > SELECT activity_test.occurred_at, activity_test.actor_id,
> > activity_test.actee_id, activity_test.properties;
> > Total MapReduce jobs = 1
> > Starting Job = job_200811250653_0022, Tracking URL = http:// 
> {clipped}:
> > 50030/jobdetails.jsp?jobid=job_200811250653_0022
> > Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> > Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0022
> >   map = 0%,  reduce =0%
> >   map = 50%,  reduce =0%
> >   map = 100%,  reduce =0%
> > Ended Job = job_200811250653_0022
> > Moving data to: /data/output2
> > OK
> >
> > $ hadoop fs -cat /data/output2/*
> > 012{}
> >
> > Still getting incorrect results, is there anything else I could try?
> >
> > Josh Ferguson
> >
> > On Nov 25, 2008, at 6:34 PM, Ashish Thusoo wrote:
> >
> >> Can you try putting the ascii value within quotes, so for example
> >> FIELDS TERMINATED BY '124' etc...
> >>
> >> You can also look at the following file in the source to see an
> >> example of how this is done
> >>
> >> ql/src/test/queries/clientpositive/input_dynamicserde.q
> >>
> >> Ashish
> >>
> >> -----Original Message-----
> >> From: Josh Ferguson [mailto:josh@besquared.net]
> >> Sent: Tuesday, November 25, 2008 6:18 PM
> >> To: hive-user@hadoop.apache.org
> >> Subject: Trouble Loading Into External Table
> >>
> >> Ok so I'm trying to create an external table and load a delimited
> >> file into it, then just do a basic select out of it, here is a
> >> description of my scenario along with steps and results I took.
> >> Hopefully someone can help me figure out what I'm doing wrong.
> >>
> >> # Sample.tab
> >>
> >> 1227422134|2|1|paid:44519,tax:2120,value:42399
> >>
> >> # CREATE TABLE
> >>
> >> hive> CREATE EXTERNAL TABLE activity_test
> >>> (occurred_at INT, actor_id INT, actee_id INT, properties
> >>> MAP<STRING, STRING>)
> >>> ROW FORMAT DELIMITED
> >>> FIELDS TERMINATED BY "|"
> >>> COLLECTION ITEMS TERMINATED BY ","
> >>> MAP KEYS TERMINATED BY ":"
> >>> LOCATION '/data/sample';
> >> OK
> >>
> >> # LOAD DATA
> >>
> >> hive> LOAD DATA LOCAL INPATH '/Users/josh/Hive/sample.tab' INTO  
> TABLE
> >> activity_test;
> >> Copying data from file:/Users/josh/Hive/sample.tab Loading data to
> >> table activity_test OK
> >>
> >> # SELECT OVERWRITE DIRECTORY
> >>
> >> hive> FROM activity_test INSERT OVERWRITE DIRECTORY '/data/output'
> >> SELECT activity_test.occurred_at, activity_test.actor_id,
> >> activity_test.actee_id, activity_test.properties; Total MapReduce
> >> jobs = 1 Starting Job = job_200811250653_0018, Tracking URL =
> >> http://{clipped}:
> >> 50030/jobdetails.jsp?jobid=job_200811250653_0018
> >> Kill Command = /Users/josh/Hadoop/bin/hadoop job  -
> >> Dmapred.job.tracker={clipped}:54311 -kill job_200811250653_0018
> >>   map = 0%,  reduce =0%
> >>   map = 50%,  reduce =0%
> >>   map = 100%,  reduce =0%
> >> Ended Job = job_200811250653_0018
> >> Moving data to: /data/output
> >> OK
> >> Time taken: 72.329 seconds
> >>
> >> $ hadoop fs -cat /data/output/*
> >> 012{}
> >>
> >> This obviously isn't the correct output, and are just some default
> >> values for those columns, what am I doing wrong?
> >>
> >> Thanks
> >>
> >> Josh Ferguson
> >
>
>
>


Mime
View raw message