incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohini Palaniswamy <rohini.adi...@gmail.com>
Subject Re: Dynamic Partition in HCatalog 0.4 throws FileAlreadyExists exception
Date Mon, 13 Aug 2012 18:56:32 GMT
Sorry Rajesh. I missed this mail somehow. Can you create a JIRA for this if
you still encounter this with the latest hcatalog build?

-Rohini

On Sun, Jul 15, 2012 at 5:00 PM, Rajesh Balamohan <
rajesh.balamohan@gmail.com> wrote:

> Hi Rohini,
>
>
> Here is a simple use case which can reproduce this error. I have also
> attached the stacktrace
>
> 1. In HCat, create 2 tables (table_1 and table_2)
> 2. Load from table_1 to table_2 with dynamic partition
>
> hcat -e "create table table_1( HIT_TIME_GMT string,SERVICE
> string,ACCEPT_LANGUAGE string, DATE_TIME string)
> partitioned by (load_date string,repo_name string) row format delimited
> fields terminated by '\t' stored as textfile";
>
> hcat -e "create table table_2( HIT_TIME_GMT string,SERVICE
> string,ACCEPT_LANGUAGE string, DATE_TIME string)
> partitioned by (load_date string,repo_name string) row format delimited
> fields terminated by '\t' stored as textfile";
>
> Have some data populated to this with load_date='20120101' and
> repo_name='testRepo'
>
> a = load 'table_1' using org.apache.hcatalog.pig.HCatLoader();
> b = filter a by (load_date == '20120101' and repo_name == 'testRepo');
> store b into 'table_2' using org.apache.hcatalog.pig.HCatStorer();
>
>
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
> hdfs://cluster:54310/user/hive8/warehouse/db/table_1/_DYN0.4448079902737385/load_date=20120515/repo_name=testRepo
> already exists
>  at
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:117)
> at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:201)
>  at
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
> at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>  at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:531)
> at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:248)
>  at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>  at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>  at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
> ~Rajesh.B
>
>
>
> On Tue, Jun 26, 2012 at 2:01 AM, Rohini Palaniswamy <
> rohini.aditya@gmail.com> wrote:
>
>> Rajesh,
>>    Can you attach the full stacktrace and some steps to reproduce.
>>
>> Regards,
>>  Rohini
>>
>>
>> On Sun, Jun 24, 2012 at 7:36 AM, Rajesh Balamohan <
>> rajesh.balamohan@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I have been using HCatalog 0.4.0 with Pig 0.9.3
>>>
>>> I invoke a macro which does couple of joins. Finally I store it using
>>> HCatStorer(). The table its storing is partitioned by 2 columns (load_date
>>> and region).
>>>
>>> When I run the script, the final M/R job launches 11 mappers are
>>> launched. It starts throwing the following error.
>>>
>>> FileAlreadyExistsException: Output directory hdfs://blah
>>> blah/table_name/_DYNO.342343/load_date=20120101 already exists
>>>
>>> Is this a known issue?. All 11 mappers are trying to check if the above
>>> path exists and throws the error. Any pointers would be of great help.
>>>
>>> --
>>> ~Rajesh.B
>>>
>>
>>
>
>
> --
> ~Rajesh.B
>

Mime
View raw message