hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4085) internationalization support and sort order (ascedning/descending) support in create table
Date Sat, 06 Sep 2008 07:02:46 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628843#action_12628843
] 

Ashish Thusoo commented on HADOOP-4085:
---------------------------------------

Comments are below. The most major one is about how we are treating character set name in
the grammar. Ideally we would want this to an identifier instead of token (similar to table
name identifiers). With that approach we would be able to support any kinds of character sets
very easily.

Inline Comments:
cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java:85: nitpick - Can we follow the convention
of having the opening brace on the same line as the code.
ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g:781: Instead of having fixed tokens per
character set in the grammar, we should define a character-set identifier and pass that across
to the java calls. That is much more scalable and would get us to seamlessly be able to support
any character sets supported by the java run time.

 http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html 

has information on what can be grammar rules to determine the character set name and how new
charactersets can be added to the JVM by CharactersetProvider. 
So the rule for the character set could look something like

 charSetStringLiteral : charSetIdentifier StringLiteral charSetIdentifier can be defined in
terms of the rules mentioned in the link above.

ql/src/test/queries/clientpositive/inputddl4.q:0: Lets put a brief comment in this describing
what this actually tests.
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:157: nitpick - maybe we should call
this PREFIX and not SAME
ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:143: Should this not check across
all sort columns instead of bucket columns? Is this a bug?
ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java:384: This function hardcodes the
terminating character and the field delimiters while in the current code these are parameterized
which is better as later we want to drive them through session level properties.

> internationalization support and sort order (ascedning/descending) support in create
table
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4085
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4085
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/hive
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>         Attachments: patch1
>
>
> User cannot specify utf8 strings in the query, both for selection and filtering. Mysql
syntax should be followed: 
> select _utf8 'string' from <TableName>
> select <selectExpr> from <TableName> where col = _utf8 0x<HexValue>
> To start with, utf8 strings should be supported. Support for other character sets can
be added in the future on demand.
> The identifiers (table name/column name etc.) cannot be utf8 strings, it is only for
the data values.
> Although, in create table, the user has the option of specifying sorted columns, he does
not have the option of specifying whether they are ascending or descending.
> Create Table syntax should be enhanced to support that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message