hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-958) Splitting output data on key field
Date Wed, 04 Nov 2009 06:31:33 GMT

    [ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773389#action_12773389
] 

Ankur commented on PIG-958:
---------------------------

> Can you explain this a little bit more - ......
In the earlier patch (958.v3.patch), After moving the results from the tasks current working
directory, I was manually deleting the directory. This is to ensure that empty part files
don't get moved to the final output directory. But doing so causes hadoop to complain that
it can no longer write to task's output dir and the task fails.

> I saw compile errors while trying to run unit test: ...
Did you compile the pig.jar  and ran core test before. This creates the necessary classes
and jar file son the local machine required by contrib tests.

On my local machine
gankur@grainflydivide-dr:pig_trunk$ ant 
...
buildJar:
     [echo] svnString 830456
      [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev-core.jar
      [jar] Building jar: /home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev.jar
     [copy] Copying 1 file to /home/gankur/eclipse/workspace/pig_trunk

gankur@grainflydivide-dr:pig_trunk$ ant test
...
test-core:
   [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/build/test/logs
    [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/build/test/logs
    [junit] Running org.apache.pig.test.TestAdd
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.024 sec
    [junit] Running org.apache.pig.test.TestAlgebraicEval
...
gankur@grainflydivide-dr:pig_trunk$ cd contrib/piggybank/java/
gankur@grainflydivide-dr:java$ ant test
...
test:
     [echo]  *** Running UDF tests ***
   [delete] Deleting directory /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
    [mkdir] Created dir: /home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
    [junit] Running org.apache.pig.piggybank.test.evaluation.TestEvalString
    [junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.15 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.TestMathUDF
    [junit] Tests run: 35, Failures: 0, Errors: 0, Time elapsed: 0.123 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.TestStat
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.114 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.datetime.TestDiffDate
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.105 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.decode.TestDecode
    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.089 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestHashFNV
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.094 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestLookupInFiles
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.163 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.string.TestRegex
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.092 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestSearchQuery
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.093 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.util.TestTop
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.099 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestDateExtractor
    [junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.087 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestHostExtractor
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.083 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchEngineExtractor
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.091 sec
    [junit] Running org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchTermExtractor
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.1 sec
    [junit] Running org.apache.pig.piggybank.test.storage.TestCombinedLogLoader
    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.535 sec
    [junit] Running org.apache.pig.piggybank.test.storage.TestCommonLogLoader
    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.54 sec
    [junit] Running org.apache.pig.piggybank.test.storage.TestHelper
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.014 sec
    [junit] Running org.apache.pig.piggybank.test.storage.TestMultiStorage
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.964 sec
    [junit] Running org.apache.pig.piggybank.test.storage.TestMyRegExLoader
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.452 sec
    [junit] Running org.apache.pig.piggybank.test.storage.TestRegExLoader
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.302 sec
    [junit] Running org.apache.pig.piggybank.test.storage.TestSequenceFileLoader
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.883 sec

BUILD SUCCESSFUL
Total time: 58 seconds



> Splitting output data on key field
> ----------------------------------
>
>                 Key: PIG-958
>                 URL: https://issues.apache.org/jira/browse/PIG-958
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.4.0
>            Reporter: Ankur
>         Attachments: 958.v3.patch, 958.v4.patch
>
>
> Pig users often face the need to split the output records into a bunch of files and directories
depending on the type of record. Pig's SPLIT operator is useful when record types are few
and known in advance. In cases where type is not directly known but is derived dynamically
from values of a key field in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message