phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Talluri <abhishektall...@cloudera.com.INVALID>
Subject Re: Regarding the Secondary Index write path
Date Wed, 31 Oct 2018 15:12:41 GMT
Thanks for confirming on the WALEdit Vincent.

I did see there were multiple comments saying
// skip this mutation if we aren't enabling indexing
but there isn't a way to make it skip these indexing steps or is there a
variable that needs to be set on a table level to make it skip indexing.

IMO, it would be good to check for indexes in the
PheonixIndexBuilder#isEnabled method rather than taking this entire path
and realizing that no mutation needs to be indexed. In that way, we can
still load the coprocessor with every table by default and still skip these
extra ops.

On Tue, Oct 30, 2018 at 9:57 PM Vincent Poon <vincent.poon.us@gmail.com>
wrote:

> Looks like you're right, the Indexer is loaded for all base tables.  I got
> confused with the case where the Indexer coproc is not loaded for *index*
> tables.
> I wonder what the overhead is like for having that.  It does seem from the
> code like loading it was intentional, as there are lines like this in
> preBatchMutate:
>
> // skip this mutation if we aren't enabling indexing
>
> However a lot has changed since then - one off the top of my head is we are
> doing write locking of rows within Phoenix itself now.  It seems if a table
> has no indexes, we can skip this locking.
>
> I created PHOENIX-5002 to investigate this.
>
>
> For the steps, the WALEdit in preBatchMutate is the same WALEdit for the
> data table.  The index updates get written alongside the data table updates
> in the WAL.
>
> You can see in preBatchMutate the Indexer is grabbing the WALEdit passed
> down from doMiniBatchMutation:
>
> WALEdit edit = miniBatchOp.getWalEdit(0);
>
>
>
>
> On Tue, Oct 30, 2018 at 6:08 PM Abhishek Talluri
> <abhishektalluri@cloudera.com.invalid> wrote:
>
> > Thanks Vincent. Just to clarify, if I create a table through Phoenix and
> > check the describe on the hbase table, I see that the Indexer
> co-processor
> > is loaded with the table. Is there some code that loads only the required
> > co-processors when there is an Observer event?
> >
> > Coming back to the sequence of steps, in preBatchMutate, I do see that
> > after calculating index updates, we write the entries to a single WALEdit
> > saying that all the mutations that need to be indexed are durable at this
> > point (I assume these are the WALEdits for the index updates but not the
> > WALEdits for the data table itself).
> > preBatchMutate
> > -Calculate index updates
> > -Write the index updates to WAL and make them durable (helpful because if
> > RS crashes after the write to data table)
> > doMiniBatchMutation
> > - write to WAL without sync ( I am assuming that these edits are for the
> > actual data table itself )
> > - aquire MVCC
> > - write to memstore
> > - sync WAL
> > - advance MVCC (write becomes visible)
> > postBatchMutate
> > - write index update
> >
> > Correct me if I am wrong in assuming that WALEdit in preBatchMutate is
> > different from the wal edits that will be generated for data table.
> > Once again thanks for the quick response.
> >
> >
> > On Tue, Oct 30, 2018 at 4:16 PM Vincent Poon <vincent.poon.us@gmail.com>
> > wrote:
> >
> >> If the table has no indexes, the Indexer coprocessor won't be loaded.
> >>
> >> As for your original question, the answer is a little nuanced.  For
> >> Phoenix
> >> 4.14+:
> >> The index updates are calculated in preBatchMutate, and the index
> updates
> >> are written in postBatchMutate.
> >> So from HRegion#doMiniBatchMutation, you can see the order of things.
> In
> >> short, it's something like:
> >> - Calculate index updates
> >> - write to WAL without sync
> >> - aquire MVCC
> >> - write to memstore
> >> - sync WAL
> >> - advance MVCC (write becomes visible)
> >> - write index update
> >>
> >> Note that when you write to the memstore, you have not advanced the MVCC
> >> yet.
> >> The order is pretty much what you suggested.
> >>
> >> On Tue, Oct 30, 2018 at 12:16 PM Abhishek Talluri
> >> <abhishektalluri@cloudera.com.invalid> wrote:
> >>
> >> > Thanks Geoffrey for confirming that. Will go through that
> presentation.
> >> >
> >> > I have a follow-up question though,
> >> > Let’s say if a table does not have any indexes on it, will these
> >> > co-processors still be triggered and try to calculate index updates,
> >> since
> >> > these are loaded by default for any table that is created through
> >> phoenix
> >> > OR will this write path be entirely skipped since there are no indexes
> >> on
> >> > the table? Asking this because I could not find a flag check in the
> code
> >> > which checks if the indexes are present or not in pre/post Mutate
> >> > operations.
> >> >
> >> > Regards,
> >> > Abhishek
> >> >
> >> >
> >> >
> >> > On Tue, Oct 30, 2018 at 2:25 PM Geoffrey Jacoby <gjacoby@apache.org>
> >> > wrote:
> >> >
> >> >> Abhishek,
> >> >>
> >> >> You might want to check out Vincent Poon's excellent presentation at
> >> this
> >> >> year's PhoenixCon about recent changes over the past couple of years
> to
> >> >> the
> >> >> index pipeline.
> >> >>
> >> >> https://www.youtube.com/watch?v=VBONDM7sD40
> >> >>
> >> >> One of those changes is the one you observed. Global mutable index
> >> writes
> >> >> were moved later in the HBase write pipeline to avoid some nasty
> >> deadlock
> >> >> and starvation cases that could occur when making MemStore writes /
> >> MVCC
> >> >> advancement wait on cross-server index RPCs to complete.
> >> >>
> >> >> Geoffrey
> >> >>
> >> >>
> >> >> On Tue, Oct 30, 2018 at 10:31 AM Abhishek Talluri
> >> >> <abhishektalluri@cloudera.com.invalid> wrote:
> >> >>
> >> >> > Hi All,
> >> >> >
> >> >> > I am referring to the presentation that is given in SF Hbase Meetup
> >> in
> >> >> > 2013, attached it for your reference. The write path states that,
> >> >> > co-processor calculates the index updates and wal edits
> >> >> > -> Writes it to the WAL (Making it durable)
> >> >> > -> Then write the index updates
> >> >> > -> Then proceed to the Memstore.
> >> >> >
> >> >> > But after looking at the code base (looked into 4.7 and 4.14),
it
> >> looks
> >> >> > like the write to index tables happen in the postBatchMutate phase
> >> >> which is
> >> >> > after the MemStore write finishes. I wanted to check with the
> >> community
> >> >> to
> >> >> > see if the flowchart is outdated. I feel that series of steps
> should
> >> be:
> >> >> > co-processor calculates the index updates and wal edits
> >> >> > -> Writes it to the WAL (Making it durable)
> >> >> > -> Then proceed to the Memstore.
> >> >> > -> Then write the index updates
> >> >> >
> >> >> > Appreciate any input on this. I want to clarify this because we
> see a
> >> >> case
> >> >> > where the write path could be creating some delay between write
to
> >> WAL
> >> >> and
> >> >> > MemStore and creating some out of sync issue when using hbase
lily
> >> >> indexer.
> >> >> >
> >> >> > Thanks,
> >> >> > Abhishek Talluri
> >> >> >
> >> >> >
> >> >>
> >> >
> >> >
> >> > --
> >> > Thanks,
> >> > Abhishek Talluri
> >> > Ph:9292405270
> >> >
> >> >
> >> >
> >>
> >
> >
> > --
> > Thanks,
> > Abhishek Talluri
> > Ph:9292405270
> >
> >
> >
>


-- 
Thanks,
Abhishek

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message