hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: How to understand the TS of each data version?
Date Sat, 28 Sep 2013 14:31:08 GMT
Can you make NetworkSpeed as column family ?

This way you can treat individual suppliers as columns within the column
family.
So for "user Tom has a new supplier d instead of supplier c and its speed
is 15K":

rk       NetworkSpeed
          c            d
Tom   {10K:1}
Tom                 {15K:2}

In the example above, the numbers after colon are TS. If the speed is
unknown, you can store a special marker in the Cell.
I used two rows, but as you said, the two Cells can be written using one
RPC call.

This way, NetworkSupplier column is not needed.

Cheers


On Fri, Sep 27, 2013 at 3:04 PM, yonghu <yongyong313@gmail.com> wrote:

> To Ted,
>
> --"Can you tell me why readings corresponding to different timestamps would
> appear in the same row ?"
>
> Is that mean the data versions which belong to the same row should at least
> have the same timestamps?
>
> For adding a row into HBase, I can use single Put instance, for example,
> Put put = new Put("tom") and put.addColumn("Network:Supplier","c" ),
> put.addColmn("Network:Supplier","d"). And hence the data versions will have
> the same TS.
>
> However, I can also use multiple Put instances, each Put instance for
> single data version. For example, Put put1 = new Put1("tom"),
> put1.addaddColumn("Network:Supplier","c" ). Put put2 = new Put2("tom"),
> put2.addaddColumn("Network:Supplier","d" ). In this situation, each data
> version which belongs to the same row will have different TSs even if
> logically they should have the same TSs. This situation can happen when I
> first know the name of network supplier and later get the speed of
> supplier.
>
> To lars,
>
> --"You have a single row with two columns?"
>
> This is just an example for discussion. I had a heavy discussion with the
> other person about how to understand the right data representation and the
> semantics of TS in HBase. Your explanation is one possible scenario which
> means "user Tom has a new supplier d instead of supplier c and its speed is
> 15K".
> However, it is possible that "user Tom has both suppliers c and d and 15K
> may belong to supplier c, as the speed of supplier d is not tested yet."
> The second understanding is very tricky and if it happened, we need to
> redesign the schema of database.
>
> So, I wonder
> 1. If there are any predefined semantics of TS in HBase or the semantics of
> TS is application-specific?
> 2. Can anyone give any rules of how to assign TS for data versions which
> belong to the same row?
>
> regards!
>
> Yong
>
>
>
>
>
> On Fri, Sep 27, 2013 at 7:02 PM, lars hofhansl <larsh@apache.org> wrote:
>
> > Not sure I follow.
> > You have a single row with two columns?
> > In your scenario you'd see that supplier c has 15k iff you query the
> > latest data, which seems to be what you want.
> > Note that you could also query as of TS 4 (c:20k), TS3 (d:20k), TS2
> (d:10k)
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: yonghu <yongyong313@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Friday, September 27, 2013 7:24 AM
> > Subject: How to understand the TS of each data version?
> >
> >
> > Hello,
> >
> > In my understanding, the timestamp of each data version is generated by
> Put
> > command. The value of TS is either indicated by user or assigned by HBase
> > itself. If the TS is generated by HBase, it only records when (the time
> > point) that data version is generated (Have no meaning to the
> application).
> > However, if TS is indicated by user, it may have a specific meaning to
> > applications. The reason why I want to ask this question is: How can I
> > correctly understand the meaning of following data? Suppose I have a
> table
> > which is used to record the internet speed of different suppliers for
> > specific users.
> > For example,
> >
> > rk       Network:Supplier   Network:speed
> >
> > Tom   {d:1, c:4}                 {10K:1, 20K:3, 15K:5}
> >
> > Then I can have following different data information representations:
> >
> > 1. Supplier d have speeds 10K and 20K. Supplier c have 15K.
> > 2. Supplier d have speeds 10K, 20K and 15K. We only insert the supplier c
> > but has not inserted any speed information.
> >
> > which one is the right understanding? Anyone knows whether there are any
> > predefined semantics of TS in HBase?
> >
> > regards!
> >
> > Yong
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message