hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sunita Arvind <sunitarv...@gmail.com>
Subject Re: Hive select shows null after successful data load
Date Wed, 19 Jun 2013 01:58:46 GMT
Having the a column name same as the table name, is a problem due to which
I was not able to reference jobs.values.id from jobs. Changing the table
name to jobs1 resolved the semantic error.
However, the query still returns null

hive> select jobs.values.position.title from jobs1;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201306080116_0036, Tracking URL =
http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_201306080116_0036
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 0
2013-06-18 18:55:52,381 Stage-1 map = 0%,  reduce = 0%
2013-06-18 18:55:56,394 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
0.88 sec
2013-06-18 18:55:57,400 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU
0.88 sec
2013-06-18 18:55:58,407 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
0.88 sec
MapReduce Total cumulative CPU time: 880 msec
Ended Job = job_201306080116_0036
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 35374 HDFS Write: 3
SUCCESS
Total MapReduce CPU Time Spent: 880 msec
OK
null
Time taken: 9.591 seconds


regards
Sunita


On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <sunitarvind@gmail.com>wrote:

> Ok.
> The data files are quite small. Around 35 KB and 1 KB each.
>
> [sunita@node01 tables]$  hadoop fs -ls /user/sunita/tables/jobs
> Found 1 items
> -rw-r--r--   3 sunita hdfs      35172 2013-06-18 18:31
> /user/sunita/tables/jobs/jobs_noSite_parsed.json
>
>
> [sunita@node01 tables]$ hadoop fs -text
> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more
> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values":
> [{"company": {"i
> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013, "day":
> 10, "mo
> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS, HBase,
> Pig- Ve
> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop, HDFS,
> HBase,
>  Pig, SQL, Industrial Software Development, System Integration, Java, high
> perf
> ormance, multi-threading, VerticaWe are a well known consumer product
> developme
> nt company and we are looking to add a Hadoop Engineer to our Engineering
> team.
>   You will be working with the latest ", "expirationDate": {"year": 2013,
> "day"
>
> Its a single line, so used 'more' rather than 'head'. But effectively, the
> file exists and has the data.
>
> regards
> Sunita
>
>
> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <spragues@gmail.com>wrote:
>
>> As Nitin alluded to its best to confirm the data is definitely in hdfs
>> using hdfs semantics rather than hive for the first step.
>>
>> 1. how big is it?   hadoop fs -ls <your hdfs dir>
>> 2. cat a bit of it and see if anything is there.   hadoop fs -text <your
>> hdfs dir>/<filename> | head -10
>>
>> do you see any data from step #2?
>>
>>
>>
>>
>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <sunitarvind@gmail.com>wrote:
>>
>>> I ran some complex queries. Something to the extent of
>>>                     select jobs from jobs;
>>>  which triggers map reduce jobs but does not show errors and produces
>>> the same output "null". If I try referencing the struct elements, I get
>>> error which seems to be the root cause.
>>>
>>> Attached are the select statement outputs with the corresponding hive
>>> logs.
>>>
>>> I have also attached my usage details of another table - try_parsed
>>> which has a subset of the same data which seems to work fine. Also attached
>>> is the input file for this table - try_parsed.json
>>> Thanks for your help
>>>
>>> Sunita
>>>
>>>
>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar <nitinpawar432@gmail.com>wrote:
>>>
>>>> can you run a little more complex query
>>>>
>>>> select uniq across columns or do some maths. so we know when it fires
>>>> up a mapreduce
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <sunitarvind@gmail.com>wrote:
>>>>
>>>>> Thanks for responding Nitin. Yes I am sure that serde is working fine
>>>>> and json file is being picked based on all the errors that showed up
till
>>>>> this stage. What sort of error are you suspecting. File not present or
>>>>> serde not parsing it ?
>>>>>
>>>>>
>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote:
>>>>>
>>>>>> select * from table is as good as hdfs -cat
>>>>>>
>>>>>> are you sure there is any data in the table?
>>>>>>
>>>>>>
>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <
>>>>>> sunitarvind@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am able to parse the input JSON file and load it into hive.
I do
>>>>>>> not see any errors with create table, so I am assuming that.
But when I try
>>>>>>> to read the data, I get null
>>>>>>>
>>>>>>> hive> select * from jobs;
>>>>>>> OK
>>>>>>> null
>>>>>>>
>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON plugin
>>>>>>> and it is a valid JSON. Here is my create table statement and
>>>>>>> attached is the json input file.
>>>>>>>
>>>>>>> create external table jobs (
>>>>>>> jobs STRUCT<
>>>>>>> values : ARRAY<STRUCT<
>>>>>>> company : STRUCT<
>>>>>>> id : STRING,
>>>>>>> name : STRING>,
>>>>>>> postingDate : STRUCT<
>>>>>>> year : INT,
>>>>>>> day : INT,
>>>>>>> month : INT>,
>>>>>>> descriptionSnippet : STRING,
>>>>>>> expirationDate : STRUCT<
>>>>>>> year : INT,
>>>>>>> day : INT,
>>>>>>> month : INT>,
>>>>>>> position : STRUCT<
>>>>>>> title : STRING,
>>>>>>> jobFunctions : ARRAY<STRUCT<
>>>>>>> code : STRING,
>>>>>>> name : STRING>>,
>>>>>>> industries : ARRAY<STRUCT<
>>>>>>> code : STRING,
>>>>>>> id : STRING,
>>>>>>> name : STRING>>,
>>>>>>> jobType : STRUCT<
>>>>>>> code : STRING,
>>>>>>> name : STRING>,
>>>>>>> experienceLevel : STRUCT<
>>>>>>> code : STRING,
>>>>>>> name : STRING>>,
>>>>>>> id : STRING,
>>>>>>> customerJobCode : STRING,
>>>>>>> skillsAndExperience : STRING,
>>>>>>> salary : STRING,
>>>>>>> jobPoster : STRUCT<
>>>>>>> id : STRING,
>>>>>>> firstName : STRING,
>>>>>>> lastName : STRING,
>>>>>>> headline : STRING>,
>>>>>>> referralBonus : STRING,
>>>>>>> locationDescription : STRING>>>
>>>>>>>  )
>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>>>>> LOCATION '/user/sunita/tables/jobs';
>>>>>>>
>>>>>>> The table creation works fine, but when I attempt to query, I
get
>>>>>>> null as the result.
>>>>>>> I tried adding Input/Output formats, Serde Properties, nothing
seems
>>>>>>> to impact.
>>>>>>>
>>>>>>> I am of the opinion that the libraries cannot handle this level
of
>>>>>>> nesting and I probably will have to write a custom serde or a
parser
>>>>>>> myself. Just wanted to seek guidance before I get into that.
Appreciate
>>>>>>> your help and guidance.
>>>>>>>
>>>>>>> regards
>>>>>>> Sunita
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Nitin Pawar
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>

Mime
View raw message