hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mapred Learn <mapred.le...@gmail.com>
Subject Re: Delimiter selection for Sequence Files
Date Wed, 15 Jun 2011 18:07:31 GMT
If I use hex value of a delimiter as delimiter for eg. \x01 for ctrl A. Can
I use it as a delimiter in hive/unix cut commands ?



On Tue, Jun 14, 2011 at 7:10 AM, Mapred Learn <mapred.learn@gmail.com>wrote:

>  Thanks Joe fit the reply !
> "@@##@@" looks like a big value for a delimiter.
> I will also choose something like a hex number so that it does not appear
> in the data.
>
> Sent from my iPhone
>
> On Jun 13, 2011, at 5:33 PM, Joe Stein <joe.stein@medialets.com> wrote:
>
>  I have had quite a few data sets that I have had no idea if my delimiter
> was in there so what I did was replaced my delimiter with a string I knew
> would not be in there during map and then in the reducer replaced it back
> again.
>
> e.g.
>
> replace("^","@@##@@") for each line
>
> then use ^ as your delimiter
>
> and in the reducer replace("@@##@@","^") for each line
>
> and in your reducer output qualify things appropriately for how you
> want/need to deal with the output
>
> now if your problem is splitting each line during your map and not knowing
> what to split on... well that is very related to your context
>
> you could JOIN map side a list of all possible characters with your data
> set and then reduce output only characters not found and use that as your
> delimiter.... who knows maybe you will find out that ~ is not in your
> data...
>
> On Mon, Jun 13, 2011 at 8:18 PM, Mapred Learn <mapred.learn@gmail.com>wrote:
>
>> Hi,
>> I was thinking of using CTRL A as delimiter but data that I am loading to
>> Hadoop already has CTRL A in it. What are other good choices of delimiters
>> that anybody might have used in this kind of scenario, considering that I
>> also want to query this data using Hive.
>>
>> Thanks in advance
>> -JJ
>>
>
>
>
> --
> /*
> Joe Stein, 973-944-0094
> http://www.medialets.com
> Twitter: @allthingshadoop
> */
>
>

Mime
View raw message