hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Heng Chen <heng.chen.1...@gmail.com>
Subject Re: Row Versions in Apache Hbase
Date Tue, 01 Dec 2015 10:52:49 GMT
I meet similar problem too.

This is my practice:

After logs were collected,  i will use one MR job to process this logs, and
store them into hbase,
RowKey                                Column
date + userId                      List of urls

Because urls list is very large, i do compress on it.


So if i need one person's urls history in one day,  it is only one GET

If i need one person's urls history in some days,  it is a scan,  and
because rows number is not large, scan is fast.


Hopes it will be helpful.



2015-12-01 18:39 GMT+08:00 Rajeshkumar J <rajeshkumarit8292@gmail.com>:

> Hi
>
>    Thats an sample use case for my doubt . This is my use case
>
> Customers visiting our website are generated as logs and we will be
> processing it  which is usually done by Apache Pig for processing it and
> inserts the output from pig into hbase table(test) directly using
> HbaseStorage. This will be done every morning. Data consists of following
> columns
>
> Customerid | Name | visitedurl | timestamp | location | companyname
>
> I have only one column family (test_family)
>
> As of now I have generated random no for each row and it is inserted as row
> key for that table. For ex I have following data to be inserted into table
>
> 1725|xxx|www.something.com|127987834 | india |zzzz
> 1726|yyy|www.some.com|128389478 | UK | yyyy
>
> If so I will add 1 as row key for first row and 2 for second one and so on.
>
> Note : Same id will be repeated for different days so I chose random no to
> be row-key
>
> while querying data from table where I use  scan 'test',
>
> {FILTER=>"SingleColumnValueFilter('test_family',Customerr'id',=,'binary:1002')"}
> it takes more than 2 minutes to return the results.
>
> Suggest me a way so that I have to bring down this process to 1 to 2
> seconds since I am using it in real-time analytics
>
> Thanks
>
> On Tue, Dec 1, 2015 at 3:40 PM, Heng Chen <heng.chen.1986@gmail.com>
> wrote:
>
> > So, maybe we can use 1212 + customerId as rowKey.
> > btw, what is 1212 used for?
> >
> > 2015-12-01 17:49 GMT+08:00 Rajeshkumar J <rajeshkumarit8292@gmail.com>:
> >
> > > Hi chen,
> > >
> > > yes I have customerid column to represent each customers
> > >
> > >
> > >
> > > On Tue, Dec 1, 2015 at 3:11 PM, Heng Chen <heng.chen.1986@gmail.com>
> > > wrote:
> > >
> > > > Hm.., is there anything unique like userId to represent one peopleļ¼Ÿ
> > > >
> > > >
> > > > 2015-12-01 16:33 GMT+08:00 Rajeshkumar J <
> rajeshkumarit8292@gmail.com
> > >:
> > > >
> > > > > Is there any other way to store only id becoz there may be new rows
> > > with
> > > > > the same name like
> > > > >
> > > > > 1212  |   xxxx | 20
> > > > > 1212  | yyyy  |  21
> > > > > 1212  | xxxx | 22
> > > > >
> > > > >
> > > > > On Tue, Dec 1, 2015 at 1:59 PM, Heng Chen <
> heng.chen.1986@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Yeah,  if you want to get all records about 1212,  just scan
rows
> > > with
> > > > > > prefix 1212
> > > > > >
> > > > > > 2015-12-01 16:27 GMT+08:00 Rajeshkumar J <
> > > rajeshkumarit8292@gmail.com
> > > > >:
> > > > > >
> > > > > > > so you want me to design row-key value by appending name
column
> > > value
> > > > > to
> > > > > > > the rowkey
> > > > > > >
> > > > > > > On Tue, Dec 1, 2015 at 1:19 PM, Heng Chen <
> > > heng.chen.1986@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > So, why not
> > > > > > > >
> > > > > > > > 1212-xxx    20
> > > > > > > > 1212-yyy    21
> > > > > > > > 1212-zzz    22
> > > > > > > >
> > > > > > > > 2015-12-01 15:33 GMT+08:00 Rajeshkumar J <
> > > > > rajeshkumarit8292@gmail.com
> > > > > > >:
> > > > > > > >
> > > > > > > > > Hi
> > > > > > > > >
> > > > > > > > >   I meant like below is this possible
> > > > > > > > >
> > > > > > > > > Rowkey | column family
> > > > > > > > >
> > > > > > > > >                Name | Age
> > > > > > > > >
> > > > > > > > > 1212     |   xxxx | 20
> > > > > > > > > 1212     |  yyyy | 21
> > > > > > > > > 1212  | zzzz | 22
> > > > > > > > >
> > > > > > > > > On Tue, Dec 1, 2015 at 12:03 PM, Heng Chen <
> > > > > heng.chen.1986@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > why not
> > > > > > > > > >
> > > > > > > > > > 1212 | 10, 11, 12, 13, 14, 15, 16, 27, 
28 ?
> > > > > > > > > >
> > > > > > > > > > 2015-12-01 14:29 GMT+08:00 Rajeshkumar J
<
> > > > > > > rajeshkumarit8292@gmail.com
> > > > > > > > >:
> > > > > > > > > >
> > > > > > > > > > > Hi Ted,
> > > > > > > > > > >
> > > > > > > > > > >   This is my use case. I have to store
values like this
> > is
> > > it
> > > > > > > > possible?
> > > > > > > > > > >
> > > > > > > > > > > RowKey | Values
> > > > > > > > > > >
> > > > > > > > > > > 1212   | 10,11,12
> > > > > > > > > > >
> > > > > > > > > > > 1212  | 13, 14, 15
> > > > > > > > > > >
> > > > > > > > > > > 1212  | 16,27,28
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Nov 30, 2015 at 10:40 PM, Ted
Yu <
> > > > yuzhihong@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Have you read
> > > > > http://hbase.apache.org/book.html#rowkey.design
> > > > > > ?
> > > > > > > > > > > >
> > > > > > > > > > > > bq. we can store more than one
row for a row-key
> value.
> > > > > > > > > > > >
> > > > > > > > > > > > Can you clarify your intention
/ use case ? If row
> key
> > is
> > > > the
> > > > > > > same,
> > > > > > > > > key
> > > > > > > > > > > > values would be in the same row.
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Nov 30, 2015 at 8:30 AM,
Rajeshkumar J <
> > > > > > > > > > > > rajeshkumarit8292@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > >   I am new to Apache Hbase
and I know that in a
> table
> > > > when
> > > > > we
> > > > > > > try
> > > > > > > > > to
> > > > > > > > > > > > insert
> > > > > > > > > > > > > row key value which is already
present either new
> > value
> > > > is
> > > > > > > > > discarded
> > > > > > > > > > or
> > > > > > > > > > > > > updated. Also I came across
row version through
> which
> > > we
> > > > > can
> > > > > > > > store
> > > > > > > > > > > > > different versions of row
key based on timestamp.
> Any
> > > one
> > > > > > > correct
> > > > > > > > > me
> > > > > > > > > > > if I
> > > > > > > > > > > > > am wrong? Also I need to
know is there any way we
> can
> > > > store
> > > > > > > more
> > > > > > > > > than
> > > > > > > > > > > one
> > > > > > > > > > > > > row for a row-key value.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message