pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krzysztof Indyk (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4705) Error Schema for data cannot be determined using HCatalog
Date Sat, 17 Oct 2015 12:42:07 GMT

     [ https://issues.apache.org/jira/browse/PIG-4705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Krzysztof Indyk updated PIG-4705:
---------------------------------
    Description: 
When we use {{HCatalog}} as source and destination of data for {{Pig}} on {{Tez}} we get 
??ERROR 1115: Schema for data cannot be determined??.
Pig works fine when we use map reduce or use HCatalog only as one of endpoints i.e. load data
directly from file and store using HCatalog.

The error appears after upgrading from {{Pig 0.14}} on {{Tez 0.5.2}} to {{Pig 0.15}} on {{Tez
0.7.0}} ( {{HDP 2.2.6}} to {{HDP 2.3.2}}).

To reproduce:
- create hive tables from [^hive_tables.hql]
- load data to table_input from [^sample.csv]
- run following Pig script on Tez

{code}

data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader();
items_unique = DISTINCT data;

counted = FOREACH (GROUP items_unique BY col2)
	    GENERATE
	      group AS name,
	      COUNT(items_unique) AS value;
  
STORE counted INTO 'table_output' USING org.apache.hive.hcatalog.pig.HCatStorer();
{code}

  was:
When we use {{HCatalog}} as source and destination of data for {{Pig}} on {{Tez}} we get 
??ERROR 1115: Schema for data cannot be determined??.
Pig works fine when we use map reduce or use HCatalog only as one of endpoints i.e. load data
directly from file and store using HCatalog.

The error appears after upgrading from {{Pig 0.14}} on {{Tez 0.5.2}} to {{Pig 0.15}} on {{Tez
0.7.0}} ( HDP 2.2.6}} to {{HDP 2.3.2}}).

To reproduce:
- create hive tables from [^hive_tables.hql]
- load data to table_input from [^sample.csv]
- run following Pig script on Tez

{code}

data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader();
items_unique = DISTINCT data;

counted = FOREACH (GROUP items_unique BY col2)
	    GENERATE
	      group AS name,
	      COUNT(items_unique) AS value;
  
STORE counted INTO 'table_output' USING org.apache.hive.hcatalog.pig.HCatStorer();
{code}


> Error Schema for data cannot be determined using HCatalog
> ---------------------------------------------------------
>
>                 Key: PIG-4705
>                 URL: https://issues.apache.org/jira/browse/PIG-4705
>             Project: Pig
>          Issue Type: Bug
>          Components: tez
>    Affects Versions: 0.15.0
>         Environment: HDP 2.3.2
>            Reporter: Krzysztof Indyk
>         Attachments: hive_tables.hql, sample.csv, stack_trace.log
>
>
> When we use {{HCatalog}} as source and destination of data for {{Pig}} on {{Tez}} we
get  ??ERROR 1115: Schema for data cannot be determined??.
> Pig works fine when we use map reduce or use HCatalog only as one of endpoints i.e. load
data directly from file and store using HCatalog.
> The error appears after upgrading from {{Pig 0.14}} on {{Tez 0.5.2}} to {{Pig 0.15}}
on {{Tez 0.7.0}} ( {{HDP 2.2.6}} to {{HDP 2.3.2}}).
> To reproduce:
> - create hive tables from [^hive_tables.hql]
> - load data to table_input from [^sample.csv]
> - run following Pig script on Tez
> {code}
> data = LOAD 'table_input' USING org.apache.hive.hcatalog.pig.HCatLoader();
> items_unique = DISTINCT data;
> counted = FOREACH (GROUP items_unique BY col2)
> 	    GENERATE
> 	      group AS name,
> 	      COUNT(items_unique) AS value;
>   
> STORE counted INTO 'table_output' USING org.apache.hive.hcatalog.pig.HCatStorer();
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message