hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Kozlov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()
Date Tue, 08 Jun 2010 01:16:11 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876509#action_12876509
] 

Alex Kozlov commented on HIVE-1369:
-----------------------------------

Here is a more complete description how to use the new functionality.

Let's say you have a Writable object in a Sequence file.  Let's say it is an implementation
of Session class which contains an array of events and each Event object associated with type,
timestamp, and a Map<String,String>.

You can define the following table in Hive:

CREATE EXTERNAL TABLE session (
  uid STRING,
  events ARRAY < STRUCT < type : INT, ts : BIGINT, map : MAP < STRING, STRING >
> >
)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.LazySimpleSerDe'
 STORED AS SEQUENCEFILE
LOCATION 'location_of_your_sequence_file_with_your_writable_as_value'
;

Instead of implementing a fully functional SerDe for this class (even though it's probably
a good exercise in the long run), with HIVE-1369 one can just write toString(byte[]) method
for the above Writable:

public String toString(byte[] sep) {
  StringBuffer sb = new StringBuffer();
  sb.append(getUId());
  sb.append((char)sep[0]);
  boolean firstEvent = true;
  for (Event event : getEvents()) {
    if (firstEvent) {
      firstEvent = false;
    } else {
      sb.append((char)sep[1]);
    }
    sb.append(getType());
    sb.append((char) sep[2]);
    sb.append(getTimestamp());
    sb.append((char) sep[2]);
    Map<String,String> map = event.getMap();
    boolean firstKey = true;
    if (map != null && !map.isEmpty()) {
       for(Key k : map.getKeys()) {
         if (firstKey) {
            firstKey = false;
         } else {
            sb.append((char) sep[3]);
         }
         sb.append(key);
         sb.append((char) sep[4]);
         sb.append(map.get(key));
      }
    } else {
      sb.append("\\N");
    }
  }
}

This will obviously be less efficient than implementing a full SerDe, but much more flexible
and faster.

The default Java implementation is toString() with no parameters, so there is no conflict
here.  I was thinking about adding some other parameters like null string or escape char,
but decided to keep it simple.  There is an option to use JSON serialization as well (probably
slower).

Alex K


> LazySimpleSerDe should be able to read classes that support some form of toString()
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-1369
>                 URL: https://issues.apache.org/jira/browse/HIVE-1369
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Alex Kozlov
>            Assignee: Alex Kozlov
>            Priority: Minor
>         Attachments: HIVE-1369.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects.
 It should be pretty easy to extend the class to read any object that implements toString()
method.
> Ideas or concerns?
> Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message