hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Santhosh Srinivasan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-544) Utf8StorageConverter.java does not always produce NULLs when data is malformed
Date Wed, 04 Mar 2009 23:33:56 GMT

    [ https://issues.apache.org/jira/browse/PIG-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678990#action_12678990

Santhosh Srinivasan commented on PIG-544:

Review comments:

Index: test/org/apache/pig/test/TestEvalPipeline2.java

1. Its not clear how the constant 1634952294 is arrived at. A comment alluding to this fact
will help. The same comment applies to other constants that are used in this test case.
+    @Test
+    public void testBinStorageByteArrayCastsSimple() throws IOException {
+        assertTrue( (Integer)tup.get(0) == 1634952294);

Index: src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java

1. Do we need mLog? POProject extends ExpressionOperator which in turn extends PhysicalOperator.
PhysicalOperator has a private member variable called log. Its better to use log instead of
creating another member variable.
+    protected final Log mLog = LogFactory.getLog(getClass());

2. I am not sure if processInputBag calls parseFromBytes(). Can you clarify?

+        Result res;
+        try{
+            res = processInputBag();
+        }
+        catch (Exception e){
+            Result r = new Result();
+            r.returnStatus = POStatus.STATUS_OK;
+            // can happen if parseFromBytes identifies it as being of different type

Index: src/org/apache/pig/builtin/BinStorage.java

1. Can we make the log private static ?
+    protected final Log mLog = LogFactory.getLog(getClass());

2. Minor comment. Use bytearray instead of byres array to be consistent with the bytearray
type. Also, be consistent with use of upper case for types, I see bag and then Long, Map,
+            LogUtils.warn(this, "Unable to convert bytes array to bag, " +

3. In bytesToTuple, the message should be tuple and not bag

+    public Tuple bytesToTuple(byte[] b) {
+            LogUtils.warn(this, "Unable to convert bytes array to bag, " +

4. Documentation. Now that error code 2110 has been removed,  http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification
has to be updated to reflect the removal of this error code.

-            int errCode = 2110;
-            String msg = "Could not convert bytearray to map.";
-            throw new ExecException(msg, errCode, PigException.BUG);

5. The bytes to (Integer/Long/Float/Double) methods are interpreting the first 4 to 8 bytes
(depending on the type) and returning the value as the value for the converted type. This
is a semantics question. Should we return null or should we do a best effort conversion of
bytes to basic types? For example in the bytesToCharArray routine, the first couple of bytes
are expected to be representative of the length of the string. The same argument holds good
for the tuple too. However for the numeric types, a fixed number of bytes is interpreted as
the value.

> Utf8StorageConverter.java does not always produce NULLs when data is malformed
> ------------------------------------------------------------------------------
>                 Key: PIG-544
>                 URL: https://issues.apache.org/jira/browse/PIG-544
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Thejas M Nair
>             Fix For: types_branch
>         Attachments: PIG-544.txt
> It does so for scalar types but not for complext types and not for the fields inside
of the complext types.
> This is because it uses different code to parse scalar types by themselves and scalar
types inside of a complex type. It should really use the same (its own code to do so.)
> The code it is currently uses, is inside of TextDataParser.jjt and is also used to parse
constants so we need to be careful if we want to make changes to it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message