hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7286) Parameterize HCatMapReduceTest for testing against all Hive storage formats
Date Wed, 09 Jul 2014 23:22:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14056894#comment-14056894
] 

David Chen commented on HIVE-7286:
----------------------------------

Hi [~szehon], thanks for taking the time to review this patch and for your feedback and advice.

I have made some progress finishing HIVE-5976 and fixing the remaining test failures. However,
as I am working on that patch, I realized that it only covers the current set of "native"
SerDes, i.e. Sequence File, text, Parquet, ORC, and RCFile but not Avro and any of the other
SerDes found throughout the Hive codebase. However, I do not think that this test should be
limited to only covering those storage formats or only the ones in SERDESUSINGMETASTOREFORSCHEMA.
They should cover all SerDes in the Hive codebase, especially since it is very likely that
the other SerDes are actually being used; we use Avro almost exclusively here at LinkedIn.

After further thought, Avro is a particular special case because it requires an Avro schema
to be set in the SerDe or table properties, and as a result, the test code must provide the
TypeInfo to Avro Schema converter. This is a requirement that other SerDes do not have. At
the same time, the TypeInfo to Avro Schema converter has good test coverage and will become
useful when we make the AvroSerDe a native Hive storage format and remove the requirement
for specifying an Avro schema, which should definitely be done in the future.

SerDe devs would only be required to add an entry to the table in the test with the SerDe
class and nulls in the other fields. This would indicate that HCatalog is not being tested
against the new storage format.

I am currently blocked on HIVE-5976 because there seems to be some issues with the pre-commit
tests; even so, I think I will need to spend some more time to finish that patch. After further
thought, after HIVE-5976 is committed, I think we will still want to keep most of the code
in this patch and just modify the test to make exceptions using the enumeration of StorageFormatDescriptor
in place of the TestStorageFormat classes (which is nearly identical to StorageFormatDescriptor).

Since this patch is ready and expands the coverage of the HCatMapReduceTest tests to run against
RCFile, ORC, and SequenceFile and that HIVE-5976 will take more time to complete, I think
we should go ahead and commit this patch and open a new ticket to make the necessary changes
to these tests once HIVE-05976 is done. I am also working on adding a similar fixture to the
HCatalog Pig Adapter tests, which also requires this patch.

> Parameterize HCatMapReduceTest for testing against all Hive storage formats
> ---------------------------------------------------------------------------
>
>                 Key: HIVE-7286
>                 URL: https://issues.apache.org/jira/browse/HIVE-7286
>             Project: Hive
>          Issue Type: Test
>          Components: HCatalog
>            Reporter: David Chen
>            Assignee: David Chen
>         Attachments: HIVE-7286.1.patch
>
>
> Currently, HCatMapReduceTest, which is extended by the following test suites:
>  * TestHCatDynamicPartitioned
>  * TestHCatNonPartitioned
>  * TestHCatPartitioned
>  * TestHCatExternalDynamicPartitioned
>  * TestHCatExternalNonPartitioned
>  * TestHCatExternalPartitioned
>  * TestHCatMutableDynamicPartitioned
>  * TestHCatMutableNonPartitioned
>  * TestHCatMutablePartitioned
> These tests run against RCFile. Currently, only TestHCatDynamicPartitioned is run against
any other storage format (ORC).
> Ideally, HCatalog should be tested against all storage formats supported by Hive. The
easiest way to accomplish this is to turn HCatMapReduceTest into a parameterized test fixture
that enumerates all Hive storage formats. Until HIVE-5976 is implemented, we would need to
manually create the mapping of SerDe to InputFormat and OutputFormat. This way, we can explicitly
keep track of which storage formats currently work with HCatalog or which ones are untested
or have test failures. The test fixture should also use Reflection to find all classes in
the classpath that implements the SerDe interface and raise a failure if any of them are not
enumerated.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message