hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jthie...@ina.fr
Subject Re : Re: Re : Re: Table design question
Date Fri, 27 Feb 2009 16:02:06 GMT
Hi,

following the discussion with Stack, I have modified the way I insert data in hbase.

Now, I insert data in an htable using url@date as row key.
Like this :

Case3:
BactUpdate update = new BacthUpdate(www.google.com@20090218);
update.put('content:',
1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);
update.put('type:', 'text/html');
table.commit(update);

I want to access this rows but with inexact keys. If i have inserted these rows :

www.google.com@200801
www.google.com@200901
www.google.com@201001

and make this request :

www.google.com@200902, I would like to find the row with the specified url at the closest
date from 200902 (www.google.com@200901 in my case)

So, I thought i could use the method : HTable.getClosestRowBefore(byte[] row, byte[] column)
to identify a row which the key is less than the requested one, and then scan to identify
precisely the good row.


In fact, this methods returns always the row with the null key if I request a row that doesn't
exactly match an inserted one.

Is there really a way to make this kind of request in hbase ?

Jérôme Thièvre





----- Message d'origine -----
De: stack <stack@duboce.net>
Date: Mercredi, Février 18, 2009 10:48 pm
Objet: Re: Re : Re: Table design question

> On Wed, Feb 18, 2009 at 10:29 AM, <jthievre@ina.fr> wrote:
> 
> > >
> > > Currently we can only return records at an explicit date or 
> older, not
> > > newer.
> > >
> > >
> > > Each record is made of 10 columns, and each insert is of the type;
> > > >
> > > > insertRecord(url, date, record);
> > > >
> > > > There are several possible designs for my record table :
> > > >
> > > > 1. RowKey= url and all columns are labelled with the same date.
> > >
> > > 2. RowKey=url and we use timestamp and version support of hbase,
> > > and columns
> > > > names are columnFamily names (no label).
> > > >
> > > 3. RowKey=url+date, and columns names are columnFamily names (no
> > > label).>
> > >
> > > Examples please (I've only had one cup of coffee so far this 
> morning).> >
> > >
> >
> >
> >  Supposed colum families are : {'content:', 'type:'}
> > I want to insert a new record with url www.google.com at date 
> 20090218 :
> >
> > Case 1:
> > BactUpdate update = new BacthUpdate(www.google.com);
> > update.put('content:20090218',
> > 1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);
> > update.put('type:20090218', 'text/html');
> > table.commit(update);
> >
> > Case 2: Implies use hbase versioning
> > BactUpdate update = new BacthUpdate(www.google.com, 
> toTimestamp(20090218> ));
> > update.put('content:',
> > 1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);
> > update.put('type:', 'text/html');
> > table.commit(update);
> 
> 
> 
> I like this schema best.
> 
> But both case 1 and 2 will have issues in current hbase if 
> thousands of
> versions (to be fixed in 0.20.0).  Just a heads up.
> 
> 
> >
> > Case3:
> > BactUpdate update = new BacthUpdate(www.google.com@20090218);
> > update.put('content:',
> > 1ffe36e5b13f28e69c2886f40fd3fcea2ce05d030b508c11d714dead5d69000f);
> > update.put('type:', 'text/html');
> > table.commit(update);
> >
> 
> 
> This will work fine in current hbase, even if thousands of versions.
> 
> 
> Is it possible (or will it be) to load column names without load cell
> > content ? Same questions for the timestamp ?
> >
> 
> Cell has to have something in it.
> 
> Or do you mean query hbase to find list of columns in a row without
> returning data?  If the latter is your question, no, there is no 
> way to get
> listing without getting the payload too.
> 
> St.Ack
> 

Mime
View raw message