hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Smith <xpe...@gmail.com>
Subject Help with HBase table design needed
Date Sun, 07 Aug 2011 18:17:16 GMT

I plan on using HTable, and then querying it using Elasticsearch. The problem is that I'm
new to both technologies, and it would be great to have some guidance as to how to set up
my data models.

The primary table that will be queried against will have potentially hundreds of millions
of rows, with each user having a variable amount of data that will be up into the millions.
Primarily the data is going to be maybe 30 key/value fields that represent different states,
and then 100s of boolean fields.

Most of the querying will be ad hoc realtime queries where I need the boolean fields aggregated
into percentages when filtered by date, state conditions, and some arbitrary set of conditions
on the booleans. The other common type of query would be simply by date and state conditions,
with the booleans aggregated into percentages.

So my basic question is what to do with the boolean fields, on a given row there is likely
to only be 20-50 fields set to true out of 100s. But I don't understand the query language
yet, so don't know whether I can just have a column for "booleans" with an array of all true
booleans, and query against that.

If I do have to create a column for each boolean field, does it make sense that this would
be its own column family?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message