hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Chaney <a...@mechnicality.com>
Subject Re: HBase first steps: Design a table
Date Tue, 12 Jun 2012 22:14:01 GMT
On 6/12/2012 2:42 PM, Jean-Marc Spaggiari wrote:
> Here is what the table looks like:
> +--------------+--------------+------+-----+---------+-------+
> | Field        | Type         | Null | Key | Default | Extra |
> +--------------+--------------+------+-----+---------+-------+
> | IDLow        | bigint(20)   | NO   | PRI | NULL    |       |
> | IDHigh       | bigint(20)   | NO   | PRI | NULL    |       |
> | IDAux        | bigint(20)   | NO   | PRI | NULL    |       |
> | Value        | varchar(512) | NO   |     | NULL    |       |
> | lastUpdate   | bigint(20)   | NO   |     |       0 |       |
> | crc          | bigint(20)   | YES  |     | NULL    |       |
> | language     | int(11)      | YES  |     | NULL    |       |
> | status       | int(11)      | YES  |     | NULL    |       |
> | size         | int(11)      | YES  |     | NULL    |       |
> +--------------+--------------+------+-----+---------+-------+
>
> Now, I'm trying to design an HBase table to do the same thing.
> IDLow, IDHigh and IDAux are 64 bits numbers. So I will just convert
> them to a 24 bytes array. That's fine. They are uniq, and they are the
> main key for the inserts.
>
> Based on what I read on the documentation, I need to keep as few
> Column Familly as possible. And the Columns names as short as
> possible. So let's try to have just one, named "a".
>
> That will give me something like:
> create 'mytable', 'a'
>
> Then for the cells, I can insert that way:
> put 'mytable', 'row1', 'a:a', 'ID'
> Where value1 is my 24 bytes ID.
> put 'mytable', 'row1', 'a:b', 'VALUE'
> Which is my "value" parameter from my mysql table.
> put 'mytable', 'row1', 'a:c', 'CRC'
> Which will be my CRC
>
> And so on.
Why not use ID as the row key? and put all the values at once as 
separate columns in the same row.

Put put = new put(byte [] row)
     .add(Bytes.toBytes("a"), Bytes.toBytes("CRC"), Bytes.toBytes(<crc>));
     .add(Bytes.toBytes("a"), Bytes.toBytes("VALUE"), 
Bytes.toBytes(<value>));

  // where 'row' = your 24 bytes ID as above
etc.

then send the complete row as one atomic operation.

where <crc> is the value of the CRC column and so on. I'm showing the 
Java bindings, but the principle applies to whichever client api you use.

>
> Now, I have few questions.
>
> 1) I was not able to find any reference of the primary key or primary
> index or similar in the HBase documentation. Is it automatically done
> of the first cell ("a" in the example)? Or on the row name ("row1" in
> the example)?
> 2) I will need to be able to parse the rows filtering on "status"
> field. I searched for the way to add a secondary index but I was not
> able to find it. Any place I can look for that?
> 3) Since all my entries will have unique id, can I simply use that as
> the row title instead of "row1"? What's pros what's cons?
> 4) If I don't have an index on a field will I still be able to filter
> on that field? Like select * from mytable where status = 0... With 50
> millions lines, it might take some time... But is that still doable
> without reaching memory limitation or something like that?
> 5) I would like to try my architecture on small computers before I go
> for bigger. What's the minimum memory I should have on each
> RegionServer (Or DataNode) to start the application and load few
> hundred thousands lines?
>
>
> I just order "HBase: The Definitive Guide" from amazon. I hope it will
> help me, but in the meantime, if you have some responses for me, it
> will be welcome.
>
> Thanks for your help.
>
> JM


Mime
View raw message