hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <eric...@gmail.com>
Subject Re: stargate retrieve multiple version of a cell
Date Sun, 04 Jul 2010 04:30:22 GMT
I used the shell to create the table.  This explained why it only
stored 3 versions.  I will switch to use java API to create the
tables.  Another question, I am currently sinking all data into the
same table for my prototype.  Is there any heavy cost for creating new
instance of HTable?

My code may looks like this:

for(String tableName : tableList) {
  List<PUT> list = ...;
  hbase = new HTable(new HBaseConfiguration(), tableName);
  hbase.put(list);
}

Or should I keep HTable instances in hash and reuse them later?

regards,
Eric

On Sat, Jul 3, 2010 at 5:43 PM, Jonathan Gray <jgray@facebook.com> wrote:
> Have you looked at Scan.setMaxVersions(int)?  Is that what you're looking for?
>
> Also, when you created the table, it has a default max of three versions.  Did you use
the java API or the shell to create your table?
>
> HColumnDescriptor.setMaxVersions(int) is what you want to set when you create the table
initially.  To keep all versions, use setMaxVersions(Integer.MAX_VALUE).
>
> JG
>
>> -----Original Message-----
>> From: Eric Yang [mailto:eric818@gmail.com]
>> Sent: Saturday, July 03, 2010 4:19 PM
>> To: user@hbase.apache.org
>> Subject: Re: stargate retrieve multiple version of a cell
>>
>> Hi Jonathan,
>>
>> I am trying to store large time series data.  I am using a row as a
>> group for one hour's data.  My row contains 60 timestamps, and each
>> timestamp has various cell values.  I am hoping this will produce row
>> that is not  too thick and table that is slightly shorter.  I am fine
>> with none ordered versioning, as long as I get timestamp when data is
>> retrieved for the timestamp range.  When I scan for the cell, I only
>> get the most recent three versions of the cell.
>>
>> This was tested on hbase 0.20.5, and hadoop 0.20.2.
>>
>> regards,
>> Eric
>>
>>
>>
>> On Sat, Jul 3, 2010 at 2:34 PM, Jonathan Gray <jgray@facebook.com>
>> wrote:
>> > What exactly are you trying to do with the timestamp?  Currently even
>> duplicates are retained and returned, but the order is not guaranteed
>> (though we are working on this).
>> >
>> > The behavior is related only to time/order of operations, no
>> difference if using different clients (not including behavior from
>> write buffering).
>> >
>> > JG
>> >
>> >> -----Original Message-----
>> >> From: Eric Yang [mailto:eric818@gmail.com]
>> >> Sent: Saturday, July 03, 2010 2:32 PM
>> >> To: user@hbase.apache.org
>> >> Subject: Re: stargate retrieve multiple version of a cell
>> >>
>> >> I think I just found the answer of my own question.  It was not
>> >> stargte's problem.  The data was not stored in hbase as I expected
>> it
>> >> to be.  This raised a more basic question:
>> >>
>> >> I am storing data like this:
>> >>
>> >> Put row1, cf1:c1: 0, timestamp: 10
>> >> Put row1, cf1:c2: 10, timestamp: 10
>> >> Put row1, cf1:c2: 15, timestamp: 20
>> >> Put row1, cf1:c1: 1, timestamp: 20
>> >>
>> >> I am updating individual column by timestamp, and repeat repeat this
>> >> 60 times for each of the columns.  This is all executed by the same
>> >> client.  When I scan for "row1, c2", would I get 60 different values
>> >> for each of the timestamp?
>> >>
>> >> What would happen if this kind of updates are applied by different
>> >> hbase client?
>> >>
>> >> regards,
>> >> Eric
>> >>
>> >> On Sat, Jul 3, 2010 at 1:56 PM, Eric Yang <eric818@gmail.com> wrote:
>> >> > Hi all,
>> >> >
>> >> > I am trying to use stargate to get multiple versions of the cell,
>> and
>> >> > my query looks like this:
>> >> >
>> >> > http://localhost:9090/chukwa/1278180000000-Eric-Yangs-
>> >>
>> iMac.local/Hadoop_dfs_namenode:CreateFileOps/1278183540000/127818990000
>> >> 0
>> >> >
>> >> > table name: chukwa
>> >> > row: 1278187200000-Eric-Yangs-iMac.local
>> >> > column: Hadoop_dfs_namenode:CreateFileOps
>> >> > start-timestamp: 1278183540000
>> >> > end-timestamp: 1278189900000
>> >> >
>> >> > It only shows me the most recent 3 versions, but not all the
>> versions
>> >> > in this time range.  Is this the right syntax?  What am I doing
>> >> wrong?
>> >> > Thanks
>> >> >
>> >> > regards,
>> >> > Eric
>> >> >
>> >
>

Mime
View raw message