hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Reopened: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function
Date Tue, 27 May 2008 22:16:59 GMT

     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Olga Natkovich reopened PIG-85:

Pi, I think the patch still has an issue. With this changes, there is potential of using much
more memory then needed. This is caused by the changes to the parsing code in the tuple. Looks
like when weallow array list to grow dinamically instead of specifying fixed size, it causes
large memory overhead. (Reading Java documentation, I did not see what is the reallocation
algorithm is but if it is like STL - doubling every time - this can get expensive.)

After I applied this patch, I have a group all query that used to run but now is failing.

I made quick fix - just for testing - of reusing the size of the previous tuple since most
of the time tuples have the same number of fields, and that solved the issue for this particular

That might be a resonable approach but I am open for other suggestions as well.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch,
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the
data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like
\u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character
and bails out. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message