incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian <engr...@gmail.com>
Subject Re: HCatStorer and appending to partition
Date Tue, 02 Apr 2013 21:34:13 GMT
Hi Tim!

Thanks for the quick response. I ended up creating a partition for day and
hour, but that slowed down my query times a lot for Hive. (It took like 2
minutes just to post to the job scheduler). I think daily will work. I hate
to just keep rewriting today's data in the partition over and over again.
If I end up doing something else, I'll make sure to post it.

Thanks,
Christian



On Wed, Mar 27, 2013 at 3:29 PM, Timothy Potter <thelabdude@gmail.com>wrote:

> Hi Christian,
>
> We do something similar but there's no append to an existing partition
> afaik - I'm surprised it's not failing to write the new when it already
> exists. We use a more granular partition scheme or re-write the entire
> partition each time.
>
> Cheers,
> Tim
>
>
> On Wed, Mar 27, 2013 at 3:07 PM, Christian <engrean@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to run a pig job every few minutes that should end up using
>> HCat's automatic partitioning to store the data in the correct directory
>> (/apps/hive/warehouse/ntp_hcat/request_date=2013-03-27/)
>>
>> I've set the partition column and I can successfully write data and it
>> goes to the correct place. The problem I am having is that every time I run
>> the job, it is deleting the existing data in the directory (partition).
>>
>> My store call is simply:
>>
>> STORE complete INTO 'ntp_hcat' USING org.apache.hcatalog.pig.HCatStorer();
>>
>> My table definition in Hive is:
>>
>> CREATE TABLE ntp_hcat(
>>     year INT,
>>     month INT,
>>     day INT,
>>     date_time STRING,
>>     hour INT,
>>     minute INT,
>>     second INT,
>>     seconds_in_day BIGINT,
>>     ip STRING,
>>     method STRING,
>>     path STRING,
>>     original_path STRING,
>>     is_static_resource STRING,
>>     is_page STRING,
>>     status INT,
>>     referrer_host STRING,
>>     referrer STRING,
>>     original_referrer STRING,
>>     agent STRING,
>>     content_length BIGINT,
>>     response_time FLOAT,
>>     web_server STRING,
>>     app_server STRING,
>>     session_id STRING,
>>     sold_to_party_num STRING,
>>     customer_name STRING,
>>     login_id STRING,
>>     employee_id STRING,
>>     first_name STRING,
>>     last_name STRING,
>>     session_start_date STRING,
>>     browser STRING,
>>     browser_version STRING,
>>     is_slow_response STRING)
>> COMMENT 'This is the ntp apache requests table'
>> partitioned by (request_date string)
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
>> STORED AS TEXTFILE;
>>
>> I am using HDP 1.2.1. What am I doing wrong?
>>
>> Thank you,
>> Christian
>>
>
>

Mime
View raw message