hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yonghu <yongyong...@gmail.com>
Subject Re: How to understand the TS of each data version?
Date Fri, 27 Sep 2013 22:04:34 GMT
To Ted,

--"Can you tell me why readings corresponding to different timestamps would
appear in the same row ?"

Is that mean the data versions which belong to the same row should at least
have the same timestamps?

For adding a row into HBase, I can use single Put instance, for example,
Put put = new Put("tom") and put.addColumn("Network:Supplier","c" ),
put.addColmn("Network:Supplier","d"). And hence the data versions will have
the same TS.

However, I can also use multiple Put instances, each Put instance for
single data version. For example, Put put1 = new Put1("tom"),
put1.addaddColumn("Network:Supplier","c" ). Put put2 = new Put2("tom"),
put2.addaddColumn("Network:Supplier","d" ). In this situation, each data
version which belongs to the same row will have different TSs even if
logically they should have the same TSs. This situation can happen when I
first know the name of network supplier and later get the speed of
supplier.

To lars,

--"You have a single row with two columns?"

This is just an example for discussion. I had a heavy discussion with the
other person about how to understand the right data representation and the
semantics of TS in HBase. Your explanation is one possible scenario which
means "user Tom has a new supplier d instead of supplier c and its speed is
15K".
However, it is possible that "user Tom has both suppliers c and d and 15K
may belong to supplier c, as the speed of supplier d is not tested yet."
The second understanding is very tricky and if it happened, we need to
redesign the schema of database.

So, I wonder
1. If there are any predefined semantics of TS in HBase or the semantics of
TS is application-specific?
2. Can anyone give any rules of how to assign TS for data versions which
belong to the same row?

regards!

Yong





On Fri, Sep 27, 2013 at 7:02 PM, lars hofhansl <larsh@apache.org> wrote:

> Not sure I follow.
> You have a single row with two columns?
> In your scenario you'd see that supplier c has 15k iff you query the
> latest data, which seems to be what you want.
> Note that you could also query as of TS 4 (c:20k), TS3 (d:20k), TS2 (d:10k)
>
>
> -- Lars
>
>
>
> ________________________________
>  From: yonghu <yongyong313@gmail.com>
> To: user@hbase.apache.org
> Sent: Friday, September 27, 2013 7:24 AM
> Subject: How to understand the TS of each data version?
>
>
> Hello,
>
> In my understanding, the timestamp of each data version is generated by Put
> command. The value of TS is either indicated by user or assigned by HBase
> itself. If the TS is generated by HBase, it only records when (the time
> point) that data version is generated (Have no meaning to the application).
> However, if TS is indicated by user, it may have a specific meaning to
> applications. The reason why I want to ask this question is: How can I
> correctly understand the meaning of following data? Suppose I have a table
> which is used to record the internet speed of different suppliers for
> specific users.
> For example,
>
> rk       Network:Supplier   Network:speed
>
> Tom   {d:1, c:4}                 {10K:1, 20K:3, 15K:5}
>
> Then I can have following different data information representations:
>
> 1. Supplier d have speeds 10K and 20K. Supplier c have 15K.
> 2. Supplier d have speeds 10K, 20K and 15K. We only insert the supplier c
> but has not inserted any speed information.
>
> which one is the right understanding? Anyone knows whether there are any
> predefined semantics of TS in HBase?
>
> regards!
>
> Yong
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message