hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <>
Subject [jira] [Commented] (HIVE-1719) Move RegexSerDe out of hive-contrib and over to hive-serde
Date Fri, 11 May 2012 18:16:53 GMT


Phabricator commented on HIVE-1719:

cwsteinbach has commented on the revision "HIVE-1719 [jira] Move RegexSerDe out of hive-contrib
and over to hive-serde".

  contrib/src/test/queries/clientnegative/serde_regex.q:24 Since the first CREATE TABLE statement
is expected to fail, we should probably remove the rest of this code.
  contrib/src/test/queries/clientpositive/serde_regex.q:3 This isn't necessary. QTestUtil.clearTestSideEffects()
automatically drops any tables/dbs that aren't defined in QTestUtil.srcTables. (Also, take
a look at QTestUtil.createSources() ... I remember you asked about this yesterday).
  contrib/src/test/queries/clientpositive/serde_regex.q:43 Please remove.
  contrib/src/test/queries/clientnegative/serde_regex.q:41 Please refer to a system variable
(set in one of the Ant build files) instead of using a relative path, e.g. "${}/files/apache.access.log"
  contrib/src/test/queries/clientpositive/serde_regex.q:38 Replace relative paths.
  ql/src/test/queries/clientnegative/serde_regex.q:2 Please remove.
  ql/src/test/queries/clientnegative/serde_regex.q:23 Remove the rest of this code if we're
never going to hit it.
  ql/src/test/queries/clientnegative/serde_regex2.q:2 Please remove.
  ql/src/test/queries/clientnegative/serde_regex2.q:3 Same test as clientnegative/serde_regex.q?
  ql/src/test/queries/clientnegative/serde_regex2.q:20 Remove dead code.
  ql/src/test/queries/clientnegative/serde_regex3.q:1 Remove. Replace with 'USE default;'
if you're running into problems placing a comment on the first line of the file.
  ql/src/test/queries/clientpositive/serde_regex.q:1 Remove.
  ql/src/test/queries/clientpositive/serde_regex.q:4 I think we should either roll the clientpositive
tests into a single file (my preference), or modify the qfile file names so that they give
some hint as to what is being tested. Please take a look at clientpositive/database.q for
a good example of the former approach.
  ql/src/test/queries/clientpositive/serde_regex2.q:27 Please add a newline. You might want
to configure your editor to add these automatically.
  ql/src/test/queries/clientpositive/serde_regex.q:39 It's a little dangerous using 'SELECT
*' in tests because it's not always possible to easily determine from the output which column
contains which field. In addition to 'SELECT *', please also include a query which selects
a single column in the middle in of the table, and a query which selects a subset of two or
more columns from the table.
  ql/src/test/queries/clientpositive/serde_regex2.q:4 I thought about this some  more and
think we should probably throw a runtime exception in the deserialize() method if this condition
is detected. Otherwise, we're basically signing up to support this behavior in the future,
which is not what we want to do.

  Please file a JIRA for this issue (i.e. that we want to catch this condition at compile-time
instead of runtime), and then move this test case over to the clientnegative directory and
cite the JIRA in a comment. Thanks.

  ql/src/test/queries/clientpositive/serde_regex3.q:3 This comment implies that output.format.string
actually did something in the past, which isn't true. Also, the warning message doesn't show
up in the q.out file, so I think we should probably just skip referencing this property in
the tests.
  serde/src/java/org/apache/hadoop/hive/serde2/ Need to update this javadoc.
RegexSerDe only provides deserialization capabilities.
  serde/src/java/org/apache/hadoop/hive/serde2/ Formatting: Braces always
go on their own line. This project basically uses the Sun Java Coding standard (
with a 100 char line limit and 2 character indent.
  serde/src/java/org/apache/hadoop/hive/serde2/ "Not support yet" is inaccurate
since we never plan to support this feature. Please change this to "RegexSerDe does not support
the serialize() method"
  serde/src/java/org/apache/hadoop/hive/serde2/ It's possible that this
will get triggered for every row in the input file, which will quickly overflow the logs.
We should log this at most once. Same thing goes for the log message below.

  Currently SerDes don't really have a good way of logging error information like this. We
should talk offline about ways of improving this.


To: JIRA, cwsteinbach
Cc: shreepadma

> Move RegexSerDe out of hive-contrib and over to hive-serde
> ----------------------------------------------------------
>                 Key: HIVE-1719
>                 URL:
>             Project: Hive
>          Issue Type: Task
>          Components: Serializers/Deserializers
>            Reporter: Carl Steinbach
>            Assignee: Shreepadma Venugopalan
>         Attachments: HIVE-1719.3.patch.txt, HIVE-1719.D3051.1.patch, HIVE-1719.D3051.2.patch,
HIVE-1719.D3141.1.patch, HIVE-1719.patch
> RegexSerDe is as much a part of the standard Hive distribution as the other SerDes
> currently in hive-serde. I think we should move it over to the hive-serde module so that
> users don't have to go to the added effort of manually registering the contrib jar before
> using it.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message