hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From takeshi <takeshi.m...@gmail.com>
Subject Re: How to understand the TS of each data version?
Date Tue, 01 Oct 2013 04:09:56 GMT
Hi, yonghu

I am not sure the following timestamp info. whether valuable for you, post
it anyway.

So, I wonder
> 1. If there are any predefined semantics of TS in HBase or the semantics of
> TS is application-specific?
>
As I know, the timestamp is mainly used for
  1. fetch order: from newest to oldest (biggest long to smallest long)
  2. versioning: if you have t1, t2, t3, and t4 value, with HBase default
versioning is 3, then you can fetch only t4, t3 and t2
  3. time-to-live (ttl): Predicate deletion. A threshold based on the
timestamp of a value and the internal housekeeping is checking
automatically if a value exceeds its TTL.

For more details, pls refer to http://hbase.apache.org/book/versions.html

2. Can anyone give any rules of how to assign TS for data versions which
> belong to the same row?
>

I think you can refer to facebook's Inbox search case,
http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase

FYI~



Best regards

takeshi


2013/9/28 yonghu <yongyong313@gmail.com>

> Hi, Ted
>
> Thanks for your response. This is also the way I use to avoid the problem.
>
> regards!
>
> Yong
>
>
> On Sat, Sep 28, 2013 at 4:31 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Can you make NetworkSpeed as column family ?
> >
> > This way you can treat individual suppliers as columns within the column
> > family.
> > So for "user Tom has a new supplier d instead of supplier c and its speed
> > is 15K":
> >
> > rk       NetworkSpeed
> >           c            d
> > Tom   {10K:1}
> > Tom                 {15K:2}
> >
> > In the example above, the numbers after colon are TS. If the speed is
> > unknown, you can store a special marker in the Cell.
> > I used two rows, but as you said, the two Cells can be written using one
> > RPC call.
> >
> > This way, NetworkSupplier column is not needed.
> >
> > Cheers
> >
> >
> > On Fri, Sep 27, 2013 at 3:04 PM, yonghu <yongyong313@gmail.com> wrote:
> >
> > > To Ted,
> > >
> > > --"Can you tell me why readings corresponding to different timestamps
> > would
> > > appear in the same row ?"
> > >
> > > Is that mean the data versions which belong to the same row should at
> > least
> > > have the same timestamps?
> > >
> > > For adding a row into HBase, I can use single Put instance, for
> example,
> > > Put put = new Put("tom") and put.addColumn("Network:Supplier","c" ),
> > > put.addColmn("Network:Supplier","d"). And hence the data versions will
> > have
> > > the same TS.
> > >
> > > However, I can also use multiple Put instances, each Put instance for
> > > single data version. For example, Put put1 = new Put1("tom"),
> > > put1.addaddColumn("Network:Supplier","c" ). Put put2 = new Put2("tom"),
> > > put2.addaddColumn("Network:Supplier","d" ). In this situation, each
> data
> > > version which belongs to the same row will have different TSs even if
> > > logically they should have the same TSs. This situation can happen
> when I
> > > first know the name of network supplier and later get the speed of
> > > supplier.
> > >
> > > To lars,
> > >
> > > --"You have a single row with two columns?"
> > >
> > > This is just an example for discussion. I had a heavy discussion with
> the
> > > other person about how to understand the right data representation and
> > the
> > > semantics of TS in HBase. Your explanation is one possible scenario
> which
> > > means "user Tom has a new supplier d instead of supplier c and its
> speed
> > is
> > > 15K".
> > > However, it is possible that "user Tom has both suppliers c and d and
> 15K
> > > may belong to supplier c, as the speed of supplier d is not tested
> yet."
> > > The second understanding is very tricky and if it happened, we need to
> > > redesign the schema of database.
> > >
> > > So, I wonder
> > > 1. If there are any predefined semantics of TS in HBase or the
> semantics
> > of
> > > TS is application-specific?
> > > 2. Can anyone give any rules of how to assign TS for data versions
> which
> > > belong to the same row?
> > >
> > > regards!
> > >
> > > Yong
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Sep 27, 2013 at 7:02 PM, lars hofhansl <larsh@apache.org>
> wrote:
> > >
> > > > Not sure I follow.
> > > > You have a single row with two columns?
> > > > In your scenario you'd see that supplier c has 15k iff you query the
> > > > latest data, which seems to be what you want.
> > > > Note that you could also query as of TS 4 (c:20k), TS3 (d:20k), TS2
> > > (d:10k)
> > > >
> > > >
> > > > -- Lars
> > > >
> > > >
> > > >
> > > > ________________________________
> > > >  From: yonghu <yongyong313@gmail.com>
> > > > To: user@hbase.apache.org
> > > > Sent: Friday, September 27, 2013 7:24 AM
> > > > Subject: How to understand the TS of each data version?
> > > >
> > > >
> > > > Hello,
> > > >
> > > > In my understanding, the timestamp of each data version is generated
> by
> > > Put
> > > > command. The value of TS is either indicated by user or assigned by
> > HBase
> > > > itself. If the TS is generated by HBase, it only records when (the
> time
> > > > point) that data version is generated (Have no meaning to the
> > > application).
> > > > However, if TS is indicated by user, it may have a specific meaning
> to
> > > > applications. The reason why I want to ask this question is: How can
> I
> > > > correctly understand the meaning of following data? Suppose I have a
> > > table
> > > > which is used to record the internet speed of different suppliers for
> > > > specific users.
> > > > For example,
> > > >
> > > > rk       Network:Supplier   Network:speed
> > > >
> > > > Tom   {d:1, c:4}                 {10K:1, 20K:3, 15K:5}
> > > >
> > > > Then I can have following different data information representations:
> > > >
> > > > 1. Supplier d have speeds 10K and 20K. Supplier c have 15K.
> > > > 2. Supplier d have speeds 10K, 20K and 15K. We only insert the
> > supplier c
> > > > but has not inserted any speed information.
> > > >
> > > > which one is the right understanding? Anyone knows whether there are
> > any
> > > > predefined semantics of TS in HBase?
> > > >
> > > > regards!
> > > >
> > > > Yong
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message