pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sarath (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4943) Schema issue while storing multiple pig outputs using CSVExcelStorage
Date Mon, 04 Jul 2016 15:27:11 GMT

     [ https://issues.apache.org/jira/browse/PIG-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

sarath updated PIG-4943:
------------------------
    Description: 
I have a script which stores 2 relations with different schema using CSVExcelStorage.

The issue which i see is that the script picks up the last store function and takes the schema
in that and puts it for all store functions , overriding the previous store schemas.

My Sample Script Looks like this :--

=============================================================

masterInput = load 'hbase://xyz' using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
                    'f:a,f:b,f:c,f:d')
          as (a,b,c,d);

input2 = foreach masterInput
                  generate
                        a,b;

input3 = foreach masterInput
                  generate
                      c,d;

store input2 into '/dir/ab'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

store input3 into '/dir/cd'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

=============================================================
Where *a,b,c,d* are my headers in my source file

Expected                   Output :

||file 1||
|a|b
|10|20

||file 2||
|c|d
|30|40


Actual Output :

||file 1||
|c|d
|10|20

||file 2||
|c|d
|30|40

  was:
I have a script which stores 2 relations with different schema using CSVExcelStorage.

The issue which i see is that the script picks up the last store function and takes the schema
in that and puts it for all store functions , overriding the previous store schemas.Is this
a known issue and is there a fix for this ?

My Sample Script Looks like this :--

=============================================================

masterInput = load 'hbase://xyz' using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
                    'f:a,f:b,f:c,f:d')
          as (a,b,c,d);

input2 = foreach masterInput
                  generate
                        a,b;

input3 = foreach masterInput
                  generate
                      c,d;

store input2 into '/dir/ab'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

store input3 into '/dir/cd'
using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX', 'WRITE_OUTPUT_HEADER');

=============================================================
Where *a,b,c,d* are my headers in my source file

Expected                   Output :

||file 1||
|a|b
|10|20

||file 2||
|c|d
|30|40


Actual Output :

||file 1||
|c|d
|10|20

||file 2||
|c|d
|30|40


> Schema issue while storing multiple pig outputs using CSVExcelStorage
> ---------------------------------------------------------------------
>
>                 Key: PIG-4943
>                 URL: https://issues.apache.org/jira/browse/PIG-4943
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.14.0
>            Reporter: sarath
>            Priority: Minor
>
> I have a script which stores 2 relations with different schema using CSVExcelStorage.
> The issue which i see is that the script picks up the last store function and takes the
schema in that and puts it for all store functions , overriding the previous store schemas.
> My Sample Script Looks like this :--
> =============================================================
> masterInput = load 'hbase://xyz' using org.apache.pig.backend.hadoop.hbase.HBaseStorage(
>                     'f:a,f:b,f:c,f:d')
>           as (a,b,c,d);
> input2 = foreach masterInput
>                   generate
>                         a,b;
> input3 = foreach masterInput
>                   generate
>                       c,d;
> store input2 into '/dir/ab'
> using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX',
'WRITE_OUTPUT_HEADER');
> store input3 into '/dir/cd'
> using org.apache.pig.piggybank.storage.CSVExcelStorage('\t','YES_MULTILINE', 'UNIX',
'WRITE_OUTPUT_HEADER');
> =============================================================
> Where *a,b,c,d* are my headers in my source file
> Expected                   Output :
> ||file 1||
> |a|b
> |10|20
> ||file 2||
> |c|d
> |30|40
> Actual Output :
> ||file 1||
> |c|d
> |10|20
> ||file 2||
> |c|d
> |30|40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message