hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan LeCompte" <lecom...@gmail.com>
Subject Advice on table design
Date Sat, 20 Dec 2008 23:34:31 GMT
Hello all,

I'd like a little advice on the best way to design a table in HBase.
Basically, I want to store apache access log requests in HBase so that
I can query them efficiently. The problem is that each request may
have 100's of parameters and also many requests can come in for the
same user/ip address.

So, I was thinking of the following:

1 table called "requests" and a single column family called "request"

Each row would have a key representing the user's ip address/unique
identifier, and the columns would be a timestamp of when the request
occurred, and the cell value would be a serializable Java object
representing all the url parameters of the apache web server log
request at that specific time.

Possible problems:

1) There may be thousands of requests that belong to a single unique
identifier (so there would be 1000s of columns)

Any suggestions on how to represent this best? Is anyone doing
anything similar?

FYI: I'm using Hadoop 0.19 and HBase-TRUNK.


View raw message