hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Nested data structures examples for HBase
Date Wed, 10 Sep 2014 04:26:39 GMT

Are you just kicking the tires or do you want to roll up your sleeves and do some work? 

You have options. 
Secondary Indexes. 

I don’t mean an inverted table but things like SOLR, Lucene, Elastic search… 

The only downside is that depending on what you index, you can see an explosion in the data
being stored in HBase.

But that may be beyond you.  Its a non-trivial task, and to be honest… a bit of ‘rocket

Its still doable…

On Sep 9, 2014, at 10:20 PM, Stephen Boesch <javadba@gmail.com> wrote:

> Thanks Michael, yes  cells are byte[]; therefore, storing JSON or other
> document structures is always possible.  Our use cases include querying
> individual elements in the structure - so that would require reconstituting
> the documents and then parsing them for every row.  We probably are not
> headed in the direction of HBase for those use cases: but we are trying to
> make that determination after having carefully considered the extent of the
> mismatch.
> 2014-09-09 13:37 GMT-07:00 Michael Segel <michael_segel@hotmail.com>:
>> You do realize that everything you store in Hbase are byte arrays, right?
>> That is each cell is a blob.
>> So you have the ability to create nested structures like… JSON records? ;-)
>> So to your point. You can have a column A which represents a set of values.
>> This is one reason why you shouldn’t think of HBase in terms of being
>> relational. In fact for Hadoop, you really don’t want to think in terms of
>> relational structures.
>> Think more of Hierarchical.
>> So yes, you can do what you want to do…
>> HTH
>> -Mike
>> On Sep 8, 2014, at 10:06 PM, Stephen Boesch <javadba@gmail.com> wrote:
>>> While I am aware that HBase does not have native support for nested
>>> structures, surely there are some of you that have thought through this
>> use
>>> case carefully.
>>> Our particular use case is likely having single digit nested layers with
>>> tens to hundreds of items in the lists at each level.
>>> An example would be a
>>> top Level  300 items
>>> middle level :  1 to 100 items  ("1 value"  may indicate a single value
>> as
>>> opposed to a list)
>>> third level:  1 to 50 items
>>> fourth level  1 to 20 items
>>> The column names are likely known ahead of time- which may or may not
>>> matter for hbase.  We could model the above structure in a Parquet File
>> or
>>> in Hive (with nested struct's)- but we would like to consider whether
>>> HBase.might also be an option.

View raw message