hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Narayanan K <knarayana...@gmail.com>
Subject Re: HBase Key Design : Doubt
Date Thu, 11 Oct 2012 13:26:48 GMT
Hi,

I have 2 column families A and B in table T1.

put 'T1', 'R1', 'A:qualf1',100
put 'T1', R1', 'B:qualf2', 200

As per my understanding the above is one row and one single version each
for the 2 column families.

If I do a put 'T1', 'R1', 'A:qualf1', 500, then there is another version
for the rowkey pertaining to the combination {R1, A, qualf1}

Please correct me if I am wrong.

Regards,
Narayanan

On Thu, Oct 11, 2012 at 1:02 AM, Doug Meil <doug.meil@explorysmedical.com>wrote:

>
> Correct.
>
> If you do 2 Puts for row key A-B-C-D on different days, the second Put
> logically replaces the first and the earlier Put becomes a previous
> version.  Unless you specifically want older versions, you won't get them
> in either Gets or Scans.
>
> Definitely want to read thisÅ 
>
> http://hbase.apache.org/book.html#datamodel
>
> See this for more information about they internal KeyValue structure.
>
> http://hbase.apache.org/book.html#regions.arch
> 9.7.5.4. KeyValue
>
>
> Older versions are kept around as long as the table descriptor says so
> (e.g., max versions).  See the StoreFile and Compactions entries in the
> RefGuide for more information on the internals.
>
>
>
>
> On 10/10/12 3:24 PM, "Jerry Lam" <chilinglam@gmail.com> wrote:
>
> >correct me if I'm wrong. The version applies to the individual cell (ie.
> >row key, column family and column qualifier) not (row key, column family).
> >
> >
> >On Wed, Oct 10, 2012 at 3:13 PM, Narayanan K <knarayanan88@gmail.com>
> >wrote:
> >
> >> Hi all,
> >>
> >> I have a usecase wherein I need to find the unique of some things in
> >>HBase
> >> across dates.
> >>
> >> Say, on 1st Oct, A-B-C-D appeared, hence I insert a row with rowkey :
> >> A-B-C-D.
> >> On 2nd Oct, I get the same value A-B-C-D and I don't want to redundantly
> >> store the row again with a new rowkey - A-B-C-D for 2nd Oct
> >> i.e I will not want to have 20121001-A-B-C-D and 20121002-A-B-C-D as 2
> >> rowkeys in the table.
> >>
> >> Eg: If I have 1st Oct , 2nd Oct as 2 column families and if number of
> >> versions are set to 1, only 1 row will be present in for both the dates
> >> having rowkey A-B-C-D.
> >> Hence if I need to find unique number of times A-B-C-D appeared during
> >>Oct
> >> 1 and Oct 2, I just need to take rowcount of the row A-B-C-D by
> >>filtering
> >> over the 2 column families.
> >> Similarly, if we have 10  date column families, and I need to scan only
> >>for
> >> 2 dates, then it scans only those store files having the specified
> >>column
> >> families. This will make scanning faster.
> >>
> >> But here the design problem is that I cant add more column families to
> >>the
> >> table each day.
> >>
> >> I would need to store data every day and I read that HBase doesnt work
> >>well
> >> with more than 3 column families.
> >>
> >> The other option is to have one single column family and store dates as
> >> qualifiers : date:d1, date:d2.... But here if there are 30 date
> >>qualifiers
> >> under date column family, to scan a single date qualifier or may be
> >>range
> >> of 2-3 dates will have to scan through the entire data of all d1 to d30
> >> qualifiers in the date column family which would be slower compared to
> >> having separate column families for the each date..
> >>
> >> Please share your thoughts on this. Also any alternate design
> >>suggestions
> >> you might have.
> >>
> >> Regards,
> >> Narayanan
> >>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message