hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage??
Date Sat, 02 May 2009 04:03:31 GMT

     [ https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Viraj Bhat updated PIG-798:
---------------------------

    Description: 
In the following script I have a tab separated text file, which I load using PigStorage()
and store using BinStorage()
{code}
A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);

B = group A by name;

store B into '/user/viraj/binstoragecreateop' using BinStorage();

dump B;
{code}

I later load file 'binstoragecreateop' in the following way.
{code}

A = load '/user/viraj/binstoragecreateop' using BinStorage();

B = foreach A generate $0 as name:chararray;

dump B;
{code}
Result
=======================================================================
(Amy)
(Fred)
=======================================================================
The above code work properly and returns the right results. If I use PigStorage() to achieve
the same, I get the following error.
{code}
A = load '/user/viraj/visits.txt' using PigStorage();

B = foreach A generate $0 as name:chararray;

dump B;

{code}
=======================================================================
{code}
2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch
merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
{code}
=======================================================================
So why should the semantics of BinStorage() be different from PigStorage() where is ok not
to specify a schema??? Should it not be consistent across both.

  was:
In the following script I have a tab separated text file, which I load using PigStorage()
and store using BinStorage()
{code}
A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray, time:chararray);

B = group A by name;

store B into '/user/viraj/binstoragecreateop' using BinStorage();

dump B;
{code}

I later load file 'binstoragecreateop' in the following way.
{code}

A = load '/user/viraj/binstoragecreateop' using BinStorage();

B = foreach A generate $0 as name:chararray;

dump B;
{code}
Result
=======================================================================
(Amy)
(Fred)
=======================================================================
The above code work properly and returns the right results. If I use PigStorage() to achieve
the same, I get the following error.
{code}
A = load '/user/viraj/visits.txt' using PigStorage();

B = foreach A generate $0 as name:chararray;

dump B;

{code}
=======================================================================
2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type mismatch
merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
=======================================================================
So why should the semantics of BinStorage() be different from PigStorage() where is ok not
to specify a schema??? Should it not be consistent across both.


> Schema errors when using PigStorage and none when using BinStorage??
> --------------------------------------------------------------------
>
>                 Key: PIG-798
>                 URL: https://issues.apache.org/jira/browse/PIG-798
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Viraj Bhat
>             Fix For: 0.2.0
>
>
> In the following script I have a tab separated text file, which I load using PigStorage()
and store using BinStorage()
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, url:chararray,
time:chararray);
> B = group A by name;
> store B into '/user/viraj/binstoragecreateop' using BinStorage();
> dump B;
> {code}
> I later load file 'binstoragecreateop' in the following way.
> {code}
> A = load '/user/viraj/binstoragecreateop' using BinStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> Result
> =======================================================================
> (Amy)
> (Fred)
> =======================================================================
> The above code work properly and returns the right results. If I use PigStorage() to
achieve the same, I get the following error.
> {code}
> A = load '/user/viraj/visits.txt' using PigStorage();
> B = foreach A generate $0 as name:chararray;
> dump B;
> {code}
> =======================================================================
> {code}
> 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1022: Type
mismatch merging schema prefix. Field Schema: bytearray. Other Field Schema: name: chararray
> Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
> {code}
> =======================================================================
> So why should the semantics of BinStorage() be different from PigStorage() where is ok
not to specify a schema??? Should it not be consistent across both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message