phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pariksheet Barapatre <pbarapa...@gmail.com>
Subject Data Model Suggestion
Date Tue, 23 Jun 2015 13:19:06 GMT
Hello All,

This is more like a HBase question but as I am planning to use Phoenix as a
access layer, I hope phoenix user will help me.

I would like to create time series data to get on-the-fly analytics.

This use case is for adTech.

Report - what is houly,daily,weekly impression counts at country level for
a given advertisement ID (ADID).

I am doing hourly aggregation and loading into a Phoenix table.

Primary Key - *ADID          | COUNTRY       | HOUR ID*

---------------------------------------------------------------------------------
*ADID          | COUNTRY       | HOUR ID*      |  CF.IMP  |
---------------------------------------------------------------------------------
1                | US                  | 2015062301  | 3000        |
1                | US                  | 2015062302  | 3421        |
1                | UK                  | 2015062302  | 1212        |
---------------------------------------------------------------------------------

Is it a good schema design or shall I create alternate schema as below
Primary Key - *ADID          | COUNTRY       | DAY ID*
----------------------------------------------------------------------------------------------------
*ADID          | COUNTRY       | DAY ID*      |  CF.IMP*01*  | CF.IMP*02*  |
----------------------------------------------------------------------------------------------------
1                | US                  | 20150623  | 3000        |
3421         |
1                | UK                  | 20150623  | NULL        |
1212          |
----------------------------------------------------------------------------------------------------
Here, I have taken hour part from hour ID and created 24 columns.

I gone through many time-series NoSQL blog posts, most the author suggest
to go with wider rows as above. This will reduce the scan, but I don't see
much difference in both Data Models in term of latency for scanning.

Can anybody please suggest good approach for my use case?


Cheers,
Pari

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message