hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sujee Maniyam...@sujee.net>
Subject table design suggestions...
Date Tue, 29 Sep 2009 22:10:50 GMT
HI all,
I am in the process of migrating  a relational table to Hbase.

Current table:  records user access logs
    id  : PK
    userId
    url
    timestamp
    refer_url
    ip_address
    cc   : country code of ip address

my potential queries would be
    - grab all pages visited by a user
    - generate a report of country : number of page views

I want to understand the implications of different Hbase
implementations and how they might affect queries.

A)  similar case study
(http://www.slideshare.net/hmisty/20090713-hbase-schema-design-case-studies)
recommends using timestamp as key (actually timestamp + counter).

timestamp + counter  =>  {
   user => {
        userId
   }

   url => {
        url:
   }

   from => {
    ip_address
    referer_url
    cc
   }
}



B) I am wondering if I can use 'user' as key,   but since there are
going to be multiple logs per user, one possiblity might be
    'userid + timestemp' as key

I have seen this :
http://www.nabble.com/Advice-on-table-design-td21110283.html#a21110283

userid + timestamp => {
   url => {
        url:
   }

   from => {
    ip_address
    referer_url
    cc
   }
}


C) I am also wondering since cells are versioned with timestamps, I
can use it to represent multiple requests from the same user to same
url
   userid => {
        url => {
        }

        from => {
            ip_address
            ...
        }
   }


Any suggestions are most appreciated.

thanks
SM

Mime
View raw message