tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinho Kim (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TAJO-200) RCFile compatible to apache hive
Date Wed, 20 Nov 2013 11:15:35 GMT

     [ https://issues.apache.org/jira/browse/TAJO-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jinho Kim updated TAJO-200:
---------------------------

    Attachment: TAJO-200.patch

{code:title=Text Serialize/Deserialize  |borderStyle=solid}
// Tajo
CREATE TABLE tablename (col1 type, col2 type)
USING RCFILE WITH ('rcfile.serde'='org.apache.tajo.storage.TextSerializeDeserialize')

//Hive 0.11 <=
CREATE TABLE tablename (col1 type, col2 type)
STORED AS RCFILE 

{code}

{code:title=Binary Serialize/Deserialize  |borderStyle=solid}
// Tajo
CREATE TABLE tablename (col1 type, col2 type)
USING RCFILE

//Hive
CREATE TABLE tablename (col1 type, col2 type)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' 
STORED AS RCFILE 
{code}


{code:title=Lineitem example  |borderStyle=solid}
CREATE TABLE lineitem (L_ORDERKEY bigint, 
L_PARTKEY bigint, 
L_SUPPKEY bigint, 
L_LINENUMBER bigint, 
L_QUANTITY double, 
L_EXTENDEDPRICE double, 
L_DISCOUNT double, 
L_TAX double, 
L_RETURNFLAG text, 
L_LINESTATUS text, 
L_SHIPDATE text, 
L_COMMITDATE text, 
L_RECEIPTDATE text, 
L_SHIPINSTRUCT text, 
L_SHIPMODE text, 
L_COMMENT text) 
USING RCFILE WITH ('rcfile.serde'='org.apache.tajo.storage.TextSerializeDeserialize', 
'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec',
'rcfile.null'='\\N')
{code}

> RCFile compatible to apache hive
> --------------------------------
>
>                 Key: TAJO-200
>                 URL: https://issues.apache.org/jira/browse/TAJO-200
>             Project: Tajo
>          Issue Type: New Feature
>          Components: storage
>            Reporter: Jinho Kim
>            Assignee: Jinho Kim
>             Fix For: 0.8-incubating
>
>         Attachments: TAJO-200.patch
>
>
> * Support both the text and the binary serialization/deserialization.
> ** dafault : org.apache.tajo.storage.BinarySerializeDeserialize
> * use SequenceFile.metadata.
> ** key: rcfile.serde
> ** value: org.apache.tajo.storage.BinarySerializeDeserialize, org.apache.tajo.storage.TextSerializeDeserialize
> * improve memory efficiency
> * support tajo pushdown projection
> * support compression



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message