hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Commented] (HIVE-4044) Add URL type
Date Sat, 23 Feb 2013 17:58:12 GMT


Ashutosh Chauhan commented on HIVE-4044:

URL is an unusual type to add in query processing engines. Can you spec out whats the motivation
of adding this type (e.g. you can always use string type for urls). I am assuming from your
description above that it might result in storage efficiency by having better encoding of
urls. But, I see in LazyBinaryURL following comment
 * The serialization of LazyBinaryURL is the same as the binary representation
 * of the underlying string
and also URLWritable has
  public void write(DataOutput out) throws IOException {
    if (url != null) {
      byte[] bytes = url.toString().getBytes();
      WritableUtils.writeVInt(out, bytes.length);
    } else {
      WritableUtils.writeVInt(out, 0);

So, it seems like you are storing urls as string anyways both for intermediate data of MR
as well as output of query. So, I don't see how is it resulting in better storage efficiency.

> Add URL type
> ------------
>                 Key: HIVE-4044
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Samuel Yuan
>            Assignee: Samuel Yuan
>         Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch
> Having a separate type for URLs would enable improvements in storage efficiency based
on breaking up a URL into its components. The new type will be named "URL" and made a non-reserved
keyword (see HIVE-701).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message