hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Ruchovets <oruchov...@gmail.com>
Subject Re: HIVE ORC table returns NULLs ( EMR 5.9 Hive 2.3.0 )
Date Wed, 25 Oct 2017 17:06:19 GMT
Thanks, Owen.
I tried to run from hdfs (not from s3) the problem is the same.
  May you please share your hive-site.xml? What env variables, parameters
should I check?

I would use structor with pleasure, but I need to use EMR for this project.

Thanks
Oleg

On Thu, Oct 26, 2017 at 12:22 AM, Owen O'Malley <owen.omalley@gmail.com>
wrote:

> I'm not sure. Using a virtual environment with Hortonwork's version
> (2.6.1) and hdfs instead of s3 it works:
>
> hive> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC
>> LOCATION 'hdfs://nn.example.com/user/vagrant/country/';
>> OK
>> Time taken: 4.073 seconds
>> hive> Select * from Table1;
>> OK
>> 1 Singapore
>> 2 Malaysia
>> 3 India
>> 4 Hong Kong
>> 5 Macau
>> 6 Thailand
>> 7 Indonesia
>> 8 Philippines
>> 9 Dubai
>> 10 Vietnam
>> Time taken: 0.76 seconds, Fetched: 10 row(s)
>
>
>  If you want to create a virtual environment, you can use
> https://github.com/hortonworks/structor . You can use
> the 1node-nonsecure.profile unless you want multiple nodes or security.
>
> Based on that, it is either a problem with EMR or the binding to S3.
>
> .. Owen
>
> On Wed, Oct 25, 2017 at 12:04 AM, Oleg Ruchovets <oruchovets@gmail.com>
> wrote:
>
>> Yes, It is exactly my point. Since the file has the data  (orc is valid),
>> why hive returns NULLs?
>> I tested it s3 , hdfs , hive , beeline. the behavior is the same:
>>
>>     select count (*) returns 10.
>>     select * returns NULLs ...
>>
>> What is the way to debug this problem? Any configuration, logging. I am
>> using defaults of EMR.
>>
>> Please advice.
>> Thanks, Oleg.
>>
>>
>>
>>
>>
>>
>> On Wed, Oct 25, 2017 at 2:30 PM, Owen O'Malley <owen.omalley@gmail.com>
>> wrote:
>>
>>> The file has the data. I'm not sure what Hive is doing wrong.
>>>
>>> owen@laptop> java -jar ../tools/target/orc-tools-1.5.0-SNAPSHOT-uber.jar
>>>> data ~/Downloads/Country.orc
>>>> Processing data file /Users/owen/Downloads/Country.orc [length: 392]
>>>> {"Id":1,"Name":"Singapore"}
>>>> {"Id":2,"Name":"Malaysia"}
>>>> {"Id":3,"Name":"India"}
>>>> {"Id":4,"Name":"Hong Kong"}
>>>> {"Id":5,"Name":"Macau"}
>>>> {"Id":6,"Name":"Thailand"}
>>>> {"Id":7,"Name":"Indonesia"}
>>>> {"Id":8,"Name":"Philippines"}
>>>> {"Id":9,"Name":"Dubai"}
>>>> {"Id":10,"Name":"Vietnam"}
>>>> ____________________________________________________________
>>>> ____________________________________________________________
>>>
>>>
>>>  .. Owen
>>>
>>> On Tue, Oct 24, 2017 at 11:11 PM, Oleg Ruchovets <oruchovets@gmail.com>
>>> wrote:
>>>
>>>> I am creating hive external table ORC (ORC file located on S3).
>>>>
>>>> *Command*
>>>>
>>>> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC LOCATION
's3://bucket_name'
>>>>
>>>> *After running the query*:
>>>>
>>>> Select * from Table1;
>>>>
>>>> *Result is*:
>>>>
>>>> +-------------------------------------+---------------------------------------+
>>>> | Table1.id  | Table1.name  |
>>>> +-------------------------------------+---------------------------------------+
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> | NULL                                | NULL                            
     |
>>>> +-------------------------------------+---------------------------------------+
>>>>
>>>> Interesting that the number of returned records 10 and it is correct
>>>> but all records are NULL. What is wrong, why query returns only NULLs? I
am
>>>> using EMR instances on AWS. Should I configure/check to support ORC format
>>>> for hive?
>>>>
>>>> ORC file attached
>>>>
>>>
>>>
>>
>

Mime
View raw message