hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: stargate retrieve multiple version of a cell
Date Sun, 04 Jul 2010 04:34:10 GMT
You should reuse HTable instances but they are not thread-safe so use one per thread.  Check
out the HTablePool class.

> -----Original Message-----
> From: Eric Yang [mailto:eric818@gmail.com]
> Sent: Saturday, July 03, 2010 9:30 PM
> To: user@hbase.apache.org
> Subject: Re: stargate retrieve multiple version of a cell
> 
> I used the shell to create the table.  This explained why it only
> stored 3 versions.  I will switch to use java API to create the
> tables.  Another question, I am currently sinking all data into the
> same table for my prototype.  Is there any heavy cost for creating new
> instance of HTable?
> 
> My code may looks like this:
> 
> for(String tableName : tableList) {
>   List<PUT> list = ...;
>   hbase = new HTable(new HBaseConfiguration(), tableName);
>   hbase.put(list);
> }
> 
> Or should I keep HTable instances in hash and reuse them later?
> 
> regards,
> Eric
> 
> On Sat, Jul 3, 2010 at 5:43 PM, Jonathan Gray <jgray@facebook.com>
> wrote:
> > Have you looked at Scan.setMaxVersions(int)?  Is that what you're
> looking for?
> >
> > Also, when you created the table, it has a default max of three
> versions.  Did you use the java API or the shell to create your table?
> >
> > HColumnDescriptor.setMaxVersions(int) is what you want to set when
> you create the table initially.  To keep all versions, use
> setMaxVersions(Integer.MAX_VALUE).
> >
> > JG
> >
> >> -----Original Message-----
> >> From: Eric Yang [mailto:eric818@gmail.com]
> >> Sent: Saturday, July 03, 2010 4:19 PM
> >> To: user@hbase.apache.org
> >> Subject: Re: stargate retrieve multiple version of a cell
> >>
> >> Hi Jonathan,
> >>
> >> I am trying to store large time series data.  I am using a row as a
> >> group for one hour's data.  My row contains 60 timestamps, and each
> >> timestamp has various cell values.  I am hoping this will produce
> row
> >> that is not  too thick and table that is slightly shorter.  I am
> fine
> >> with none ordered versioning, as long as I get timestamp when data
> is
> >> retrieved for the timestamp range.  When I scan for the cell, I only
> >> get the most recent three versions of the cell.
> >>
> >> This was tested on hbase 0.20.5, and hadoop 0.20.2.
> >>
> >> regards,
> >> Eric
> >>
> >>
> >>
> >> On Sat, Jul 3, 2010 at 2:34 PM, Jonathan Gray <jgray@facebook.com>
> >> wrote:
> >> > What exactly are you trying to do with the timestamp?  Currently
> even
> >> duplicates are retained and returned, but the order is not
> guaranteed
> >> (though we are working on this).
> >> >
> >> > The behavior is related only to time/order of operations, no
> >> difference if using different clients (not including behavior from
> >> write buffering).
> >> >
> >> > JG
> >> >
> >> >> -----Original Message-----
> >> >> From: Eric Yang [mailto:eric818@gmail.com]
> >> >> Sent: Saturday, July 03, 2010 2:32 PM
> >> >> To: user@hbase.apache.org
> >> >> Subject: Re: stargate retrieve multiple version of a cell
> >> >>
> >> >> I think I just found the answer of my own question.  It was not
> >> >> stargte's problem.  The data was not stored in hbase as I
> expected
> >> it
> >> >> to be.  This raised a more basic question:
> >> >>
> >> >> I am storing data like this:
> >> >>
> >> >> Put row1, cf1:c1: 0, timestamp: 10
> >> >> Put row1, cf1:c2: 10, timestamp: 10
> >> >> Put row1, cf1:c2: 15, timestamp: 20
> >> >> Put row1, cf1:c1: 1, timestamp: 20
> >> >>
> >> >> I am updating individual column by timestamp, and repeat repeat
> this
> >> >> 60 times for each of the columns.  This is all executed by the
> same
> >> >> client.  When I scan for "row1, c2", would I get 60 different
> values
> >> >> for each of the timestamp?
> >> >>
> >> >> What would happen if this kind of updates are applied by
> different
> >> >> hbase client?
> >> >>
> >> >> regards,
> >> >> Eric
> >> >>
> >> >> On Sat, Jul 3, 2010 at 1:56 PM, Eric Yang <eric818@gmail.com>
> wrote:
> >> >> > Hi all,
> >> >> >
> >> >> > I am trying to use stargate to get multiple versions of the
> cell,
> >> and
> >> >> > my query looks like this:
> >> >> >
> >> >> > http://localhost:9090/chukwa/1278180000000-Eric-Yangs-
> >> >>
> >>
> iMac.local/Hadoop_dfs_namenode:CreateFileOps/1278183540000/127818990000
> >> >> 0
> >> >> >
> >> >> > table name: chukwa
> >> >> > row: 1278187200000-Eric-Yangs-iMac.local
> >> >> > column: Hadoop_dfs_namenode:CreateFileOps
> >> >> > start-timestamp: 1278183540000
> >> >> > end-timestamp: 1278189900000
> >> >> >
> >> >> > It only shows me the most recent 3 versions, but not all the
> >> versions
> >> >> > in this time range.  Is this the right syntax?  What am I doing
> >> >> wrong?
> >> >> > Thanks
> >> >> >
> >> >> > regards,
> >> >> > Eric
> >> >> >
> >> >
> >

Mime
View raw message