hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "peter.marron@baesystems.com" <peter.mar...@baesystems.com>
Subject RE: Stored By
Date Tue, 16 Feb 2016 13:27:37 GMT
Hi Gabriel,

Yep, that's a good suggestion.
That is what I ended up doing and it seemed to work fine.
Many thanks for replying.
Apologies for not responding earlier.

Z

From: Gabriel Balan [mailto:gabriel.balan@oracle.com]
Sent: 28 January 2016 23:27
To: user@hive.apache.org
Subject: Re: Stored By

Hi

Why not write your own storage handler extending AccumuloStorageHandler and overriding getInputFormatClass()
to return your  HiveAccumuloTableInputFormat subclass.

hth
Gabriel Balan
On 1/21/2016 10:46 AM, peter.marron@baesystems.com<mailto:peter.marron@baesystems.com>
wrote:
Hi,

So I am using the AccumuloStorageHandler to allow me to access Accumulo tables from Hive.
This works fine. So typically I would use something like this:

CREATE EXTERNAL TABLE test_text (rowid STRING, testint INT, testbig BIGINT, testfloat FLOAT,
testdouble DOUBLE, teststring STRING, testbool BOOLEAN)
STORED BY 'org.apache.hadoop.hive.accumulo.AccumuloStorageHandler'
WITH SERDEPROPERTIES('accumulo.table.name'='test_table_text','accumulo.columns.mapping' =
':rowid,testint:v,testbig:v,testfloat:v,testdouble:v,teststring:v,testbool:v');

Now for many reasons I am planning to have my own InputFormat.
I don't want to start from scratch so I plan to have my class derive from the existing class
HiveAccumuloTableInputFormat and pick up a lot of functionality for free.

Now it was my understanding that "STORED BY" was a sort of optimization that saved the user
having to specify the input format and output format and so on explicitly.
Given that I want, eventually, to use my own input format class in the short-term I just want
to ensure that I can create a Hive table that uses Accumulo but specifying the inputformat
explicitly.
I've looked at the source of AccumuloStorageHandler and I can see what inputformat and outputformat
it returns.
So my best guess at creating the same table as above, but without using "STORED BY" is as
follows:

CREATE EXTERNAL TABLE test_text2 (rowid STRING, testint INT, testbig BIGINT, testfloat FLOAT,
testdouble DOUBLE, teststring STRING, testbool BOOLEAN)
ROW FORMAT SERDE 'org.apache.hadoop.hive.accumulo.serde.AccumuloSerDe'
WITH SERDEPROPERTIES('accumulo.table.name'='test_table_text','accumulo.columns.mapping' =
':rowid,testint:v,testbig:v,testfloat:v,testdouble:v,teststring:v,testbool:v')
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat';

This fails with:

FAILED: SemanticException [Error 10055]: Output Format must implement HiveOutputFormat, otherwise
it should be either IgnoreKeyTextOutputFormat or SequenceFileOutputFormat

Which seems plausible, because 'org.apache.hadoop.hive.accumulo.mr.HiveAccumuloTableOutputFormat'
really doesn't seem to implement  HiveOutputFormat.
However this begs the question, how can the storage handler get away with it if I can't?

So, before I go off and implement my own storage handler class as well as my own inputformat
class, can anyone tell me if I am doing something silly
or is there some other way around this problem?

Regards,

Z
Please consider the environment before printing this email. This message should be regarded
as confidential. If you have received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory. The contents of this email may relate to dealings with other companies
under the control of BAE Systems Applied Intelligence Limited, details of which can be found
at http://www.baesystems.com/Businesses/index.htm.



--

The statements and opinions expressed here are my own and do not necessarily represent those
of Oracle Corporation.

Please consider the environment before printing this email. This message should be regarded
as confidential. If you have received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory. The contents of this email may relate to dealings with other companies
under the control of BAE Systems Applied Intelligence Limited, details of which can be found
at http://www.baesystems.com/Businesses/index.htm.

Mime
View raw message