hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anil gupta <anilgupt...@gmail.com>
Subject Re: Pagination with HBase - getting previous page of data
Date Wed, 30 Jan 2013 07:49:37 GMT
Hi Jean,

Please find my reply inline.

On Tue, Jan 29, 2013 at 1:40 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Anil,
>
> I think it really depend on the way you want to use the pagination.
>
Absolutely true!

>
> Do you need to be able to jump to page X? Are you ok if you miss a
> line or 2? Is your data growing fastly? Or slowly? Is it ok if your
> page indexes are a day old? Do you need to paginate over 300 colums?
> Or just 1? Do you need to always have the exact same number of entries
> in each page?
>
No, i dont need to be able to jump page X.
I dont think that missing lines will be acceptable. I need to filter the
rows on non-rowkey attributes. It wont be ok if my page indexes are 1 day
old. I need to paginate on basis of various filters based on columns
or(and) rowkey. So, the number of combinations are quite large.

>
> For my usecase I need to be able to jump to the page X and I don't
> have any content. I have hundred of millions lines. Only the rowkey
> matter for me and I'm fine if sometime I have 50 entries displayed,
> and sometime only 45. So I'm thinking about calculating which row is
> the first one for each page, and store that separatly. Then I just
> need to run the MR daily.
>
hmm..yeah, it might work for you.

>
> It's not a perfect solution I agree, but this might do the job for me.
> I'm totally open to all other idea which might do the job to.
>
There is nothing like a "perfect" solution. If the implementation is able
to fulfill your business needs, then go for it.

>
> JM
>
> 2013/1/29, anil gupta <anilgupta84@gmail.com>:
> > Yes, your suggested solution only works on RowKey based pagination. It
> will
> > fail when you start filtering on the basis of columns.
> >
> > Still, i would say it's comparatively easier to maintain this at
> > Application level rather than creating tables for pagination.
> >
> > What if you have 300 columns in your schema. Will you create 300 tables?
> > What about handling of pagination when filtering is done based on
> multiple
> > columns ("and" and "or" conditions)?
> >
> > On Tue, Jan 29, 2013 at 1:08 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> No, no killer solution here ;)
> >>
> >> But I'm still thinking about that because I might have to implement
> >> some pagination options soon...
> >>
> >> As you are saying, it's only working on the row-key, but if you want
> >> to do the same-thing on non-rowkey, you might have to create a
> >> secondary index table...
> >>
> >> JM
> >>
> >> 2013/1/27, anil gupta <anilgupta84@gmail.com>:
> >> > That's alright..I thought that you have come-up with a killer
> solution.
> >> So,
> >> > got curious to hear your ideas. ;)
> >> > It seems like your below mentioned solution will not work on filtering
> >> > on
> >> > non row-key columns since when you are deciding the page numbers you
> >> > are
> >> > only considering rowkey.
> >> >
> >> > Thanks,
> >> > Anil
> >> >
> >> > On Fri, Jan 25, 2013 at 6:58 PM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> >> Hi Anil,
> >> >>
> >> >> I don't have a solution. I never tought about that ;) But I was
> >> >> thinking about something like you create a 2nd table where you place
> >> >> the raw number (4 bytes) then the raw key. You go directly to a
> >> >> specific page, you query by the number, found the key, and you know
> >> >> where to start you scan in the main table.
> >> >>
> >> >> The issue is properly the number for each lines since with a MR you
> >> >> don't know where you are from the beginning. But you can built
> >> >> something where you store the line number from the beginning of the
> >> >> region, then when all regions are parsed you can reconstruct the
> total
> >> >> numbering... That should work...
> >> >>
> >> >> JM
> >> >>
> >> >> 2013/1/25, anil gupta <anilgupta84@gmail.com>:
> >> >> > Inline...
> >> >> >
> >> >> > On Fri, Jan 25, 2013 at 9:17 AM, Jean-Marc Spaggiari <
> >> >> > jean-marc@spaggiari.org> wrote:
> >> >> >
> >> >> >> Hi Anil,
> >> >> >>
> >> >> >> The issue is that all the other sub-sequent page start should
be
> >> moved
> >> >> >> too...
> >> >> >>
> >> >> > Yes, this is a possibility. Hence the Developer has to take care
of
> >> >> > this
> >> >> > case. It might also be possible that the pageSize is not a hard
> >> >> > limit
> >> >> > on
> >> >> > number of results(more like a hint or suggestion on size). I would
> >> >> > say
> >> >> > it
> >> >> > varies by use case.
> >> >> >
> >> >> >>
> >> >> >> so if you want to jump directly to page n, you might be totally
> >> >> >> shifted because of all the data inserted in the meantime...
> >> >> >>
> >> >> >> If you want a real complete pagination feature, you might
want to
> >> have
> >> >> >> a coproccessor or a MR updating another table refering to
the
> >> >> >> pages....
> >> >> >>
> >> >> > Well, the solution depends on the use case. I will be doing
> >> >> > pagination
> >> >> > in
> >> >> > HBase for a restful service but till now i am unable to find any
> >> reason
> >> >> why
> >> >> > this cant be done at application level.
> >> >> > Are you suggesting to use MR for paging in HBase? If yes, how?
> >> >> > How would you use another table for pagination?what would you
store
> >> >> > in
> >> >> the
> >> >> > extra table?
> >> >> >
> >> >> >>
> >> >> >> JM
> >> >> >>
> >> >> >> 2013/1/25, anil gupta <anilgupta84@gmail.com>:
> >> >> >> > Hi Vijay,
> >> >> >> >
> >> >> >> > I've done paging in HBase by using Scan only(no pagination
> >> >> >> > filter)
> >> >> >> > as
> >> >> >> > Mohammed has explained. However it was just an experimental
> >> >> >> > stuff.
> >> >> >> > It
> >> >> >> works
> >> >> >> > but Jean raised a very good point.
> >> >> >> > Find my answer inline to fix the problem that Jean reported.
> >> >> >> >
> >> >> >> >
> >> >> >> > On Fri, Jan 25, 2013 at 4:38 AM, Jean-Marc Spaggiari
<
> >> >> >> > jean-marc@spaggiari.org> wrote:
> >> >> >> >
> >> >> >> >> Hi Vijay,
> >> >> >> >>
> >> >> >> >> If, while the user os scrolling forward, you store
the key of
> >> >> >> >> each
> >> >> >> >> page, then you will be able to go back to a specific
page, and
> >> jump
> >> >> >> >> forward back up to where he was.
> >> >> >> >>
> >> >> >> >> The only issue is that, if while the user is scrolling
the
> >> >> >> >> table,
> >> >> >> >> someone insert a row between the last of a page,
and the first
> >> >> >> >> of
> >> >> >> >> the
> >> >> >> >> next page, you will never see this row.
> >> >> >> >>
> >> >> >> >> Let's take this exemaple.
> >> >> >> >>
> >> >> >> >> You have 10 items per page.
> >> >> >> >>
> >> >> >> >> 010 020 030 040 050 060 070 080 090 100 is the first
page.
> >> >> >> >> 110 120 130 140 150 160 170 180 190 200 is the second
one.
> >> >> >> >>
> >> >> >> >> Now, if someone insert 101... If will be just after
100 and
> >> >> >> >> before
> >> >> >> >> 110.
> >> >> >> >>
> >> >> >> > Anil: Instead of scanning from 010 to 100, scan from
010 to 110.
> >> >> >> > Then
> >> >> >> > we
> >> >> >> > wont have this problem. So, i mean to say that
> >> >> >> > startRow(firstRowKeyofPage(N)) and
> >> >> >> > stopRow(firstRowKeyofPage(N+1)).
> >> >> >> > This
> >> >> >> > would fix it. Also, in that case number of results might
exceed
> >> >> >> > the
> >> >> >> > pageSize. So you might need to handle this logic.
> >> >> >> >
> >> >> >> >>
> >> >> >> >> When you will display 10 rows starting at 010 you
will stop
> just
> >> >> >> >> before 101... And for the next page you will start
at 110...
> And
> >> >> >> >> 101
> >> >> >> >> will never be displayed...
> >> >> >> >>
> >> >> >> >> HTH
> >> >> >> >>
> >> >> >> >> JM
> >> >> >> >>
> >> >> >> >> 2013/1/25, Mohammad Tariq <dontariq@gmail.com>:
> >> >> >> >> > Hello sir,
> >> >> >> >> >
> >> >> >> >> >       While paging through, store the startkey
of the current
> >> >> >> >> > page
> >> >> >> >> > of
> >> >> >> >> > 25
> >> >> >> >> > rows
> >> >> >> >> > in a separate byte[]. Now, if you want to come
back to this
> >> >> >> >> > page
> >> >> >> >> > when
> >> >> >> >> > you
> >> >> >> >> > are at the next page do a range query where
 startkey would
> be
> >> >> >> >> > the
> >> >> >> >> > rowkey
> >> >> >> >> > you had stored earlier and the endkey would
be the
> startrowkey
> >> >> >> >> > of
> >> >> >> >>  current
> >> >> >> >> > page. You have to store just one rowkey each
time you show a
> >> page
> >> >> >> using
> >> >> >> >> > which you could come back to this page when
you are at the
> >> >> >> >> > next
> >> >> >> >> > page.
> >> >> >> >> >
> >> >> >> >> > However, this approach will fail in a case where
your user
> >> >> >> >> > would
> >> >> >> >> > like
> >> >> >> >> > to
> >> >> >> >> go
> >> >> >> >> > to a particular previous page.
> >> >> >> >> >
> >> >> >> >> > Warm Regards,
> >> >> >> >> > Tariq
> >> >> >> >> > https://mtariq.jux.com/
> >> >> >> >> > cloudfront.blogspot.com
> >> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > On Fri, Jan 25, 2013 at 10:28 AM, Vijay Ganesan
> >> >> >> >> > <vijay@scaligent.com>
> >> >> >> >> > wrote:
> >> >> >> >> >
> >> >> >> >> >> I'm displaying rows of data from a HBase
table in a data
> grid
> >> >> >> >> >> UI.
> >> >> >> >> >> The
> >> >> >> >> >> grid
> >> >> >> >> >> shows 25 rows at a time i.e. it is paginated.
User can click
> >> >> >> >> >> on
> >> >> >> >> >> Next/Previous to paginate through the data
25 rows at a
> time.
> >> >> >> >> >> I
> >> >> can
> >> >> >> >> >> implement Next easily by setting a HBase
> >> >> >> >> >> org.apache.hadoop.hbase.filter.PageFilter
and setting
> >> >> >> >> >> startRow
> >> >> >> >> >> on
> >> >> >> >> >> the
> >> >> >> >> >> org.apache.hadoop.hbase.client.Scan to be
the row id of the
> >> next
> >> >> >> >> >> batch's
> >> >> >> >> >> row that is sent to the UI with the previous
batch. However,
> >> >> >> >> >> I
> >> >> >> >> >> can't
> >> >> >> >> seem
> >> >> >> >> >> to be able to do the same with Previous.
I can set the
> endRow
> >> on
> >> >> >> >> >> the
> >> >> >> >> Scan
> >> >> >> >> >> to be the row id of the last row of the
previous batch but
> >> since
> >> >> >> HBase
> >> >> >> >> >> Scans are always in the forward direction,
there is no way
> to
> >> >> >> >> >> set
> >> >> a
> >> >> >> >> >> PageFilter that can get 25 rows ending at
a particular row.
> >> >> >> >> >> The
> >> >> >> >> >> only
> >> >> >> >> >> option
> >> >> >> >> >> seems to be to get *all* rows up to the
end row and filter
> >> >> >> >> >> out
> >> >> >> >> >> all
> >> >> >> but
> >> >> >> >> >> the
> >> >> >> >> >> last 25 in the caller, which seems very
inefficient. Any
> >> >> >> >> >> ideas
> >> >> >> >> >> on
> >> >> >> >> >> how
> >> >> >> >> >> this
> >> >> >> >> >> can be done efficiently?
> >> >> >> >> >>
> >> >> >> >> >> --
> >> >> >> >> >> -Vijay
> >> >> >> >> >>
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Thanks & Regards,
> >> >> >> > Anil Gupta
> >> >> >> >
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Thanks & Regards,
> >> >> > Anil Gupta
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards,
> >> > Anil Gupta
> >> >
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message