hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Pagination with HBase - getting previous page of data
Date Wed, 30 Jan 2013 08:03:39 GMT
Hi Mohammad,

You are most welcome to join the discussion. I have never used PageFilter
so i don't really have concrete input.
I had a look at
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/PageFilter.html
I could not understand that why it goes to multiple regionservers in
parallel. Why it cannot guarantee results <= page size( my guess: due to
multiple RS scans)? If you have used it then maybe you can explain the
behaviour?

Thanks,
Anil

On Tue, Jan 29, 2013 at 7:32 PM, Mohammad Tariq <dontariq@gmail.com> wrote:

> I'm kinda hesitant to put my leg in between the pros ;)But, does it sound
> sane to use PageFilter for both rows and columns and having some additional
> logic to handle the 'nth' page logic?It'll help us in both kind of paging.
>
> On Wednesday, January 30, 2013, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org>
> wrote:
> > Hi Anil,
> >
> > I think it really depend on the way you want to use the pagination.
> >
> > Do you need to be able to jump to page X? Are you ok if you miss a
> > line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> > page indexes are a day old? Do you need to paginate over 300 colums?
> > Or just 1? Do you need to always have the exact same number of entries
> > in each page?
> >
> > For my usecase I need to be able to jump to the page X and I don't
> > have any content. I have hundred of millions lines. Only the rowkey
> > matter for me and I'm fine if sometime I have 50 entries displayed,
> > and sometime only 45. So I'm thinking about calculating which row is
> > the first one for each page, and store that separatly. Then I just
> > need to run the MR daily.
> >
> > It's not a perfect solution I agree, but this might do the job for me.
> > I'm totally open to all other idea which might do the job to.
> >
> > JM
> >
> > 2013/1/29, anil gupta <anilgupta84@gmail.com>:
> >> Yes, your suggested solution only works on RowKey based pagination. It
> will
> >> fail when you start filtering on the basis of columns.
> >>
> >> Still, i would say it's comparatively easier to maintain this at
> >> Application level rather than creating tables for pagination.
> >>
> >> What if you have 300 columns in your schema. Will you create 300 tables?
> >> What about handling of pagination when filtering is done based on
> multiple
> >> columns ("and" and "or" conditions)?
> >>
> >> On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> >> jean-marc@spaggiari.org> wrote:
> >>
> >>> No, no killer solution here ;)
> >>>
> >>> But I'm still thinking about that because I might have to implement
> >>> some pagination options soon...
> >>>
> >>> As you are saying, it's only working on the row-key, but if you want
> >>> to do the same-thing on non-rowkey, you might have to create a
> >>> secondary index table...
> >>>
> >>> JM
> >>>
> >>> 2013/1/27, anil gupta <anilgupta84@gmail.com>:
> >>> > That's alright..I thought that you have come-up with a killer
> solution.
> >>> So,
> >>> > got curious to hear your ideas. ;)
> >>> > It seems like your below mentioned solution will not work on
> filtering
> >>> > on
> >>> > non row-key columns since when you are deciding the page numbers you
> >>> > are
> >>> > only considering rowkey.
> >>> >
> >>> > Thanks,
> >>> > Anil
> >>> >
> >>> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >>> > jean-marc@spaggiari.org> wrote:
> >>> >
> >>> >> Hi Anil,
> >>> >>
> >>> >> I don't have a solution. I never tought about that ;) But I was
> >>> >> thinking about something like you create a 2nd table where you
place
> >>> >> the raw number (4 bytes) then the raw key. You go directly to a
> >>> >> specific page, you query by the number, found the key, and you
know
> >>> >> where to start you scan in the main table.
> >>> >>
> >>> >> The issue is properly the number for each lines since with a MR
you
> >>> >> don't know where you are from the beginning. But you can built
> >>> >> something where you store the line number from the beginning of
the
> >>> >> region, then when all regions are parsed you can reconstruct the
> total
> >>> >> numbering... That should work...
> >>> >>
> >>> >> JM
> >>> >>
> >>> >> 2013/1/25, anil gupta <anilgupta84@gmail.com>:
> >>> >> > Inline...
> >>> >> >
> >>> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >>> >> > jean-marc@spaggiari.org> wrote:
> >>> >> >
> >>> >> >> Hi Anil,
> >>> >> >>
> >>> >> >> The issue is that all the other sub-sequent page start
should be
> >>> moved
> >>> >> >> too...
> >>> >> >>
> >>> >> > Yes, this is a possibility. Hence the Developer has to take
care
> of
> >>> >> > this
> >>> >> > case. It might also be possible that the pageSize is not a
hard
> >>> >> > limit
> >>> >> > on
> >>> >> > number of results(more like a hint or suggestion on size).
I would
> >>> >> > say
> >>> >> > it
> >>> >> > varies by use case.
> >>> >> >
> >>> >> >>
> >>> >> >> so if you want to jump directly to page n, you might be
totally
> >>> >> >> shifted because of all the data inserted in the meantime...
> >>> >> >>
> >>> >> >> If you want a real complete pagination feature, you might
want to
> >>> have
> >>> >> >> a coproccessor or a MR updating another table refering
to the
> >>> >> >> pages....
> >>> >> >>
> >>> >> > Well, the solution depends on the use case. I will be doing
> >>> >> > pagination
> >
>
> --
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message