hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Pig and Hive on the same data?
Date Wed, 30 Sep 2009 19:20:55 GMT
On Wed, Sep 30, 2009 at 11:45 AM, Ashutosh Chauhan
<ashutosh.chauhan@gmail.com> wrote:
> Hi Chris,
>
> Pig doesn't mandate a Ctrl-A or any other character to be used as field
> delimiter. You can tell Pig which delimiter to use. For example, you can
> specify Ctrl-A as field delimiter  as following:
>
> A = load 'mydata' using PigStorage('\u0001');
>
> If you don't specify any delimiter, e.g. A = load 'mydata';  tab is assumed
> to be a delimiter.
>
> Also, if you have more questions on Pig, please post on pig-user list to get
> faster response.
>
> Thanks,
> Ashutosh
>
> On Wed, Sep 30, 2009 at 10:55, dumbfounder <chris@searchles.com> wrote:
>
>>
>> We would like to use the same data for Pig and Hive queries for
>> flexibility,
>> has anyone done this without having 2 copies of the data? Hive seems to
>> only
>> want to work with CTRL-A delimited data, and I don't see a way to specify
>> CTRL-A as a delimiter for Pig. Is there another efficient regex that people
>> have used for Pig, or has anyone figured out a way to use delimiters that
>> aren't CTRL-A for Hive? Or are there any other outside the box ideas?
>> --
>> View this message in context:
>> http://www.nabble.com/Pig-and-Hive-on-the-same-data--tp25682735p25682735.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>
Hive does allow you to set delimiters. CTRL-A is the default.

row_format
  : DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char]
        [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
  | SERDE serde_name [WITH SERDEPROPERTIES
property_name=property_value, property_name=property_value, ...]

http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL

Mime
View raw message