incubator-hcatalog-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HCATALOG-121) TextStorageOutputDriver for Pig
Date Fri, 21 Oct 2011 18:24:32 GMT

    [ https://issues.apache.org/jira/browse/HCATALOG-121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132895#comment-13132895
] 

Daniel Dai commented on HCATALOG-121:
-------------------------------------

bq.CreateTableHook.java : You are using IgnoreKeyTextOutputFormat. Cant we use mapred.TextOutputFormat
?
Hive require OutputFormat either extending HiveOutputFormat, or IgnoreKeyTextOutputFormat/SequenceFileOutputFormat

bq.HCatLoader/Storer.java : Changes about signatures are very subtle. Is there an easy way
to have tests for these? I guess it will show up if there are more then one loader or storer
in test case. It will be good to have test for them to prevent regressions.
Yes, we should. I will add these

bq. LoadFuncBasedInputFormat.java : Why do you need to do an instanceof ? AFAIK PigStorage
always returns DataByteArray, actual typing in Pig happens later on.
LoadFunc other than PigStorage might returns non-DataByteArray. Currently, LoadFuncBasedInputFormat
only convert DataByteArray to actual type, or don't do type conversion. And improvement is
to convert arbitrary input data into the type specification in table metadata.

bq. StoreFuncBasedOututDriver.java: Spelling mistake in name of class. In convertValue(),
instead of instantiating DefaultTupleFactory for every call, create the factory from TupleFactory
and save its instance. Also, newTupleNoCopy() is better then append() to avoid memcpy.
Nice catch for the name. Yes, I can reuse the factory. Will change.

bq. TestPigStorageDriver.java: Avoid hard coding the file location.
I follow the convention of other tests in the same suit. But sure I can change.

bq. PigStorageOutputDriver/InputDriver.java: It will be cool to instantiate underlying loadfunc
reflectively. We can store the loadfunc names in metastore. This will enable cost of adding
new store func based drivers from trivial to zero, since then new StoreFuncBasedDriver can
be added without any code change.
That's possible, need to add key/value pair for Loader/Storer in the table metadata.
                
> TextStorageOutputDriver for Pig
> -------------------------------
>
>                 Key: HCATALOG-121
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-121
>             Project: HCatalog
>          Issue Type: New Feature
>          Components: storage handlers
>    Affects Versions: 0.3
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.3
>
>         Attachments: HCATALOG-121-1.patch, HCATALOG-121-2.patch, HCATALOG-121-3.patch
>
>
> HCATALOG need a plain text based StorageOutputDriver, which wrap PigStorage for Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message