hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kulkarni.swarnim@gmail.com" <kulkarni.swar...@gmail.com>
Subject Re: Mapping HBase column qualifier with a ':' in it
Date Thu, 20 Feb 2014 02:12:07 GMT
The syntax for this is very similar to the original one.

CREATE EXTERNAL TABLE t (
  id string,
  cf map<string, string>
) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:*my_prefix.**")

and then it can be queried similar to how you were querying before.


On Wed, Feb 19, 2014 at 8:09 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> Hi Den,
>
> I think that is a valid solution. If you are using a version of hive >
> 0.12, you can also select columns from hbase using prefixes (introduced in
> [1]). Marginally more efficient than the "select all columns" approach and
> little more flexible as now just all columns sharing the given prefix need
> to have the same type.
>
> [1] https://issues.apache.org/jira/browse/HIVE-3725
>
>
> On Wed, Feb 19, 2014 at 5:22 PM, Den <dennis.lipovsky@gmail.com> wrote:
>
>> I've arrived at a workaround that I'm using for this for now. Basically I
>> have a map in the Hive table corresponding to the column family then I'm
>> able to select from it. The downside here is that every qualifier has to
>> have the same data type and storage type, which I suppose you could work
>> around by having two map columns pointing to the same HBase column family
>> with different value types and storage types. Here's the basic idea of it:
>>
>> CREATE EXTERNAL TABLE t (
>>   id string,
>>   cf map<string, string>
>> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:")
>>
>> then it can be queried as:
>>
>> select cf['q1:q2'] from t
>>
>>
>> On Mon, Feb 17, 2014 at 6:16 PM, Den <dennis.lipovsky@gmail.com> wrote:
>>
>>> I assume by 'escape' you mean something like:
>>>
>>> "hbase.columns.mapping" = ":key,cf:q1\:q2"
>>>
>>> That gave the same error as before. If that's not what you mean could
>>> you expand?
>>>
>>> I also tried to see if I could trick it by doing some binary encoding of
>>> the character
>>>
>>> "hbase.columns.mapping" = ":key,cf:q1\x3Aq2"
>>>
>>> where \x3A is the ascii hex code for ':'. That also didn't work either,
>>> the 'create external table' went through but all the values were NULL.
>>>
>>> I dug around a bit in the source and found that in HBaseSerDe.java in
>>> the parseColumnsMapping(...) function there is the following:
>>>
>>>       int idxFirst = colInfo.indexOf(":");
>>>       int idxLast = colInfo.lastIndexOf(":");
>>>
>>>       if (idxFirst < 0 || !(idxFirst == idxLast)) {
>>>         throw new SerDeException("Error: the HBase columns mapping contains a
badly formed " +
>>>             "column family, column qualifier specification.");
>>>       }
>>>
>>> It seems that this will throw this error if there is not exactly 1 colon in the
HBase column to map. So short of tricking it into thinking something else is a colon there
might not be any way to map my columns without renaming them first. Thoughts?
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Feb 14, 2014 at 10:50 AM, kulkarni.swarnim@gmail.com <
>>> kulkarni.swarnim@gmail.com> wrote:
>>>
>>>> Hi Den,
>>>>
>>>> Have you tried escaping the additional colon in the qualifier name?
>>>>
>>>>
>>>> On Fri, Feb 14, 2014 at 9:47 AM, Den <dennis.lipovsky@gmail.com> wrote:
>>>>
>>>>> I'm working with an HBase database with a column of the form
>>>>> 'cf:q1:q2' where 'cf' is the column family 'q1:q2' is the column qualifier.
>>>>> When trying to map this in Hive I'm using a statement like the following:
>>>>>
>>>>> CREATE EXTERNAL TABLE t (
>>>>>   id string
>>>>>   q1_q2 string
>>>>> ) WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:q1:q2")
>>>>>
>>>>> I get an error saying
>>>>>
>>>>> Error: the HBase columns mapping contains a badly formed column
>>>>> family, column qualifier specification.
>>>>>
>>>>> This seems to be due to the colon in the column qualifier. It seems to
>>>>> demand that there be exactly on colon in the field name and it has to
be
>>>>> the one separating the column family from the column qualifier.
>>>>>
>>>>> Is there a reason that is the case? Is there any way around it so I
>>>>> can map the columns from the HBase DB to Hive?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Swarnim
>>>>
>>>
>>>
>>
>
>
> --
> Swarnim
>



-- 
Swarnim

Mime
View raw message