hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kulkarni.swarnim@gmail.com" <kulkarni.swar...@gmail.com>
Subject Re: Mapping HBase column qualifier with a ':' in it
Date Thu, 20 Feb 2014 02:09:56 GMT
Hi Den,

I think that is a valid solution. If you are using a version of hive >
0.12, you can also select columns from hbase using prefixes (introduced in
[1]). Marginally more efficient than the "select all columns" approach and
little more flexible as now just all columns sharing the given prefix need
to have the same type.

[1] https://issues.apache.org/jira/browse/HIVE-3725


On Wed, Feb 19, 2014 at 5:22 PM, Den <dennis.lipovsky@gmail.com> wrote:

> I've arrived at a workaround that I'm using for this for now. Basically I
> have a map in the Hive table corresponding to the column family then I'm
> able to select from it. The downside here is that every qualifier has to
> have the same data type and storage type, which I suppose you could work
> around by having two map columns pointing to the same HBase column family
> with different value types and storage types. Here's the basic idea of it:
>
> CREATE EXTERNAL TABLE t (
>   id string,
>   cf map<string, string>
> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:")
>
> then it can be queried as:
>
> select cf['q1:q2'] from t
>
>
> On Mon, Feb 17, 2014 at 6:16 PM, Den <dennis.lipovsky@gmail.com> wrote:
>
>> I assume by 'escape' you mean something like:
>>
>> "hbase.columns.mapping" = ":key,cf:q1\:q2"
>>
>> That gave the same error as before. If that's not what you mean could you
>> expand?
>>
>> I also tried to see if I could trick it by doing some binary encoding of
>> the character
>>
>> "hbase.columns.mapping" = ":key,cf:q1\x3Aq2"
>>
>> where \x3A is the ascii hex code for ':'. That also didn't work either,
>> the 'create external table' went through but all the values were NULL.
>>
>> I dug around a bit in the source and found that in HBaseSerDe.java in
>> the parseColumnsMapping(...) function there is the following:
>>
>>       int idxFirst = colInfo.indexOf(":");
>>       int idxLast = colInfo.lastIndexOf(":");
>>
>>       if (idxFirst < 0 || !(idxFirst == idxLast)) {
>>         throw new SerDeException("Error: the HBase columns mapping contains a badly
formed " +
>>             "column family, column qualifier specification.");
>>       }
>>
>> It seems that this will throw this error if there is not exactly 1 colon in the HBase
column to map. So short of tricking it into thinking something else is a colon there might
not be any way to map my columns without renaming them first. Thoughts?
>>
>>
>>
>>
>> On Fri, Feb 14, 2014 at 10:50 AM, kulkarni.swarnim@gmail.com <
>> kulkarni.swarnim@gmail.com> wrote:
>>
>>> Hi Den,
>>>
>>> Have you tried escaping the additional colon in the qualifier name?
>>>
>>>
>>> On Fri, Feb 14, 2014 at 9:47 AM, Den <dennis.lipovsky@gmail.com> wrote:
>>>
>>>> I'm working with an HBase database with a column of the form 'cf:q1:q2'
>>>> where 'cf' is the column family 'q1:q2' is the column qualifier. When
>>>> trying to map this in Hive I'm using a statement like the following:
>>>>
>>>> CREATE EXTERNAL TABLE t (
>>>>   id string
>>>>   q1_q2 string
>>>> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:q1:q2")
>>>>
>>>> I get an error saying
>>>>
>>>> Error: the HBase columns mapping contains a badly formed column family,
>>>> column qualifier specification.
>>>>
>>>> This seems to be due to the colon in the column qualifier. It seems to
>>>> demand that there be exactly on colon in the field name and it has to be
>>>> the one separating the column family from the column qualifier.
>>>>
>>>> Is there a reason that is the case? Is there any way around it so I can
>>>> map the columns from the HBase DB to Hive?
>>>>
>>>
>>>
>>>
>>> --
>>> Swarnim
>>>
>>
>>
>


-- 
Swarnim

Mime
View raw message