hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Twensky" <jim.twen...@gmail.com>
Subject Accessing rows with number indexes
Date Sat, 10 Jan 2009 04:56:34 GMT
Hello,

I have an HBase table that contains sentences as row keys and a few numeric
values as columns. A simple abstract model of the table looks like the
following:

--------------------------------------------------------------------------------------------------------------------------
Sentence     |          frequency:value     |      probability:value-0
|     probability:value-2
--------------------------------------------------------------------------------------------------------------------------
Hello World |                 5                    |      0.000545321
|     0.002368204
     .                              .
.                             .
     .                              .
.                             .
     .                              .
.                             .
--------------------------------------------------------------------------------------------------------------------------


I create the table and load it using Hadoop and there are hundreds of
billions of entries in it. I use this table to solve an optimization problem
using a hill climbing/simulated annealing method. Basically, I need to
change the likelihood values randomly. For example, I need to change say the
first 5 rows starting at the 112th row and do some calculations and so on...

Now the problem is, I can't see an easy way to access to the n'th row
directly. If I was using a traditional RDBMS, I'd add another column and
auto-increment it each time I added a new row but this is not possible since
I load the table using Hadoop and the there are parallel insertions taking
place simultaneously. A quick and dirty way to do this might be adding a new
index column after I load and initialize the table but the table is huge and
it doesn't seem right to me. Another bad approach would be to use a scanner
starting from the first row and calling Scanner.next() n times inside a for
loop to access the n'th row, which also seems very slow. Any ideas on how I
could do it more efficiently?

Thanks in advance,
Jim

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message