hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Pagination with HBase - getting previous page of data
Date Tue, 29 Jan 2013 21:40:41 GMT
Hi Anil,

I think it really depend on the way you want to use the pagination.

Do you need to be able to jump to page X? Are you ok if you miss a
line or 2? Is your data growing fastly? Or slowly? Is it ok if your
page indexes are a day old? Do you need to paginate over 300 colums?
Or just 1? Do you need to always have the exact same number of entries
in each page?

For my usecase I need to be able to jump to the page X and I don't
have any content. I have hundred of millions lines. Only the rowkey
matter for me and I'm fine if sometime I have 50 entries displayed,
and sometime only 45. So I'm thinking about calculating which row is
the first one for each page, and store that separatly. Then I just
need to run the MR daily.

It's not a perfect solution I agree, but this might do the job for me.
I'm totally open to all other idea which might do the job to.

JM

2013/1/29, anil gupta <anilgupta84@gmail.com>:
> Yes, your suggested solution only works on RowKey based pagination. It will
> fail when you start filtering on the basis of columns.
>
> Still, i would say it's comparatively easier to maintain this at
> Application level rather than creating tables for pagination.
>
> What if you have 300 columns in your schema. Will you create 300 tables?
> What about handling of pagination when filtering is done based on multiple
> columns ("and" and "or" conditions)?
>
> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> No, no killer solution here ;)
>>
>> But I'm still thinking about that because I might have to implement
>> some pagination options soon...
>>
>> As you are saying, it's only working on the row-key, but if you want
>> to do the same-thing on non-rowkey, you might have to create a
>> secondary index table...
>>
>> JM
>>
>> 2013/1/27, anil gupta <anilgupta84@gmail.com>:
>> > That's alright..I thought that you have come-up with a killer solution.
>> So,
>> > got curious to hear your ideas. ;)
>> > It seems like your below mentioned solution will not work on filtering
>> > on
>> > non row-key columns since when you are deciding the page numbers you
>> > are
>> > only considering rowkey.
>> >
>> > Thanks,
>> > Anil
>> >
>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> Hi Anil,
>> >>
>> >> I don't have a solution. I never tought about that ;) But I was
>> >> thinking about something like you create a 2nd table where you place
>> >> the raw number (4 bytes) then the raw key. You go directly to a
>> >> specific page, you query by the number, found the key, and you know
>> >> where to start you scan in the main table.
>> >>
>> >> The issue is properly the number for each lines since with a MR you
>> >> don't know where you are from the beginning. But you can built
>> >> something where you store the line number from the beginning of the
>> >> region, then when all regions are parsed you can reconstruct the total
>> >> numbering... That should work...
>> >>
>> >> JM
>> >>
>> >> 2013/1/25, anil gupta <anilgupta84@gmail.com>:
>> >> > Inline...
>> >> >
>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
>> >> > jean-marc@spaggiari.org> wrote:
>> >> >
>> >> >> Hi Anil,
>> >> >>
>> >> >> The issue is that all the other sub-sequent page start should be
>> moved
>> >> >> too...
>> >> >>
>> >> > Yes, this is a possibility. Hence the Developer has to take care of
>> >> > this
>> >> > case. It might also be possible that the pageSize is not a hard
>> >> > limit
>> >> > on
>> >> > number of results(more like a hint or suggestion on size). I would
>> >> > say
>> >> > it
>> >> > varies by use case.
>> >> >
>> >> >>
>> >> >> so if you want to jump directly to page n, you might be totally
>> >> >> shifted because of all the data inserted in the meantime...
>> >> >>
>> >> >> If you want a real complete pagination feature, you might want
to
>> have
>> >> >> a coproccessor or a MR updating another table refering to the
>> >> >> pages....
>> >> >>
>> >> > Well, the solution depends on the use case. I will be doing
>> >> > pagination
>> >> > in
>> >> > HBase for a restful service but till now i am unable to find any
>> reason
>> >> why
>> >> > this cant be done at application level.
>> >> > Are you suggesting to use MR for paging in HBase? If yes, how?
>> >> > How would you use another table for pagination?what would you store
>> >> > in
>> >> the
>> >> > extra table?
>> >> >
>> >> >>
>> >> >> JM
>> >> >>
>> >> >> 2013/1/25, anil gupta <anilgupta84@gmail.com>:
>> >> >> > Hi Vijay,
>> >> >> >
>> >> >> > I've done paging in HBase by using Scan only(no pagination
>> >> >> > filter)
>> >> >> > as
>> >> >> > Mohammed has explained. However it was just an experimental
>> >> >> > stuff.
>> >> >> > It
>> >> >> works
>> >> >> > but Jean raised a very good point.
>> >> >> > Find my answer inline to fix the problem that Jean reported.
>> >> >> >
>> >> >> >
>> >> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari <
>> >> >> > jean-marc@spaggiari.org> wrote:
>> >> >> >
>> >> >> >> Hi Vijay,
>> >> >> >>
>> >> >> >> If, while the user os scrolling forward, you store the
key of
>> >> >> >> each
>> >> >> >> page, then you will be able to go back to a specific page,
and
>> jump
>> >> >> >> forward back up to where he was.
>> >> >> >>
>> >> >> >> The only issue is that, if while the user is scrolling
the
>> >> >> >> table,
>> >> >> >> someone insert a row between the last of a page, and the
first
>> >> >> >> of
>> >> >> >> the
>> >> >> >> next page, you will never see this row.
>> >> >> >>
>> >> >> >> Let's take this exemaple.
>> >> >> >>
>> >> >> >> You have 10 items per page.
>> >> >> >>
>> >> >> >> 010 020 030 040 050 060 070 080 090 100 is the first page.
>> >> >> >> 110 120 130 140 150 160 170 180 190 200 is the second
one.
>> >> >> >>
>> >> >> >> Now, if someone insert 101... If will be just after 100
and
>> >> >> >> before
>> >> >> >> 110.
>> >> >> >>
>> >> >> > Anil: Instead of scanning from 010 to 100, scan from 010 to
110.
>> >> >> > Then
>> >> >> > we
>> >> >> > wont have this problem. So, i mean to say that
>> >> >> > startRow(firstRowKeyofPage(N)) and
>> >> >> > stopRow(firstRowKeyofPage(N+1)).
>> >> >> > This
>> >> >> > would fix it. Also, in that case number of results might exceed
>> >> >> > the
>> >> >> > pageSize. So you might need to handle this logic.
>> >> >> >
>> >> >> >>
>> >> >> >> When you will display 10 rows starting at 010 you will
stop just
>> >> >> >> before 101... And for the next page you will start at
110... And
>> >> >> >> 101
>> >> >> >> will never be displayed...
>> >> >> >>
>> >> >> >> HTH
>> >> >> >>
>> >> >> >> JM
>> >> >> >>
>> >> >> >> 2013/1/25, Mohammad Tariq <dontariq@gmail.com>:
>> >> >> >> > Hello sir,
>> >> >> >> >
>> >> >> >> >       While paging through, store the startkey of
the current
>> >> >> >> > page
>> >> >> >> > of
>> >> >> >> > 25
>> >> >> >> > rows
>> >> >> >> > in a separate byte[]. Now, if you want to come back
to this
>> >> >> >> > page
>> >> >> >> > when
>> >> >> >> > you
>> >> >> >> > are at the next page do a range query where  startkey
would be
>> >> >> >> > the
>> >> >> >> > rowkey
>> >> >> >> > you had stored earlier and the endkey would be the
startrowkey
>> >> >> >> > of
>> >> >> >>  current
>> >> >> >> > page. You have to store just one rowkey each time
you show a
>> page
>> >> >> using
>> >> >> >> > which you could come back to this page when you are
at the
>> >> >> >> > next
>> >> >> >> > page.
>> >> >> >> >
>> >> >> >> > However, this approach will fail in a case where
your user
>> >> >> >> > would
>> >> >> >> > like
>> >> >> >> > to
>> >> >> >> go
>> >> >> >> > to a particular previous page.
>> >> >> >> >
>> >> >> >> > Warm Regards,
>> >> >> >> > Tariq
>> >> >> >> > https://mtariq.jux.com/
>> >> >> >> > cloudfront.blogspot.com
>> >> >> >> >
>> >> >> >> >
>> >> >> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
>> >> >> >> > <vijay@scaligent.com>
>> >> >> >> > wrote:
>> >> >> >> >
>> >> >> >> >> I'm displaying rows of data from a HBase table
in a data grid
>> >> >> >> >> UI.
>> >> >> >> >> The
>> >> >> >> >> grid
>> >> >> >> >> shows 25 rows at a time i.e. it is paginated.
User can click
>> >> >> >> >> on
>> >> >> >> >> Next/Previous to paginate through the data 25
rows at a time.
>> >> >> >> >> I
>> >> can
>> >> >> >> >> implement Next easily by setting a HBase
>> >> >> >> >> org.apache.hadoop.hbase.filter.PageFilter and
setting
>> >> >> >> >> startRow
>> >> >> >> >> on
>> >> >> >> >> the
>> >> >> >> >> org.apache.hadoop.hbase.client.Scan to be the
row id of the
>> next
>> >> >> >> >> batch's
>> >> >> >> >> row that is sent to the UI with the previous
batch. However,
>> >> >> >> >> I
>> >> >> >> >> can't
>> >> >> >> seem
>> >> >> >> >> to be able to do the same with Previous. I can
set the endRow
>> on
>> >> >> >> >> the
>> >> >> >> Scan
>> >> >> >> >> to be the row id of the last row of the previous
batch but
>> since
>> >> >> HBase
>> >> >> >> >> Scans are always in the forward direction, there
is no way to
>> >> >> >> >> set
>> >> a
>> >> >> >> >> PageFilter that can get 25 rows ending at a particular
row.
>> >> >> >> >> The
>> >> >> >> >> only
>> >> >> >> >> option
>> >> >> >> >> seems to be to get *all* rows up to the end row
and filter
>> >> >> >> >> out
>> >> >> >> >> all
>> >> >> but
>> >> >> >> >> the
>> >> >> >> >> last 25 in the caller, which seems very inefficient.
Any
>> >> >> >> >> ideas
>> >> >> >> >> on
>> >> >> >> >> how
>> >> >> >> >> this
>> >> >> >> >> can be done efficiently?
>> >> >> >> >>
>> >> >> >> >> --
>> >> >> >> >> -Vijay
>> >> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Thanks & Regards,
>> >> >> > Anil Gupta
>> >> >> >
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Thanks & Regards,
>> >> > Anil Gupta
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Mime
View raw message