Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: <1390254853.96356.YahooMailNeo@web140605.mail.bf1.yahoo.com>
References: <52CEAAFA0196073C003A00A2_0_821354@p058>
 <CAB5sDNKXaj-r71y5i1y6RB0duCjq227YhF3nUiChnb1pp_L9ZQ@mail.gmail.com>
 <-4494665118441587577@unknownmsgid>
 <BLU0-SMTP3367FD88149B7FED35BC9D68FA50@phx.gbl>
 <CAG_TOPCp=-s4xvNT=+YDDPDXLuA9aW_3u4ENDipfptnSjhx=Vg@mail.gmail.com>
 <BLU0-SMTP134A0B94EA4DC7599AFEDFC8FA50@phx.gbl>
 <CAG_TOPAg8YKyKpRgGa8abx=zM3BN3hmcppyAnyVVsfV0kKss=w@mail.gmail.com>
 <DC5EBE7F3610EB4CA5C7E92D76873E1518629B58BA@exchange2007.carrieriq.com>
 <1390254853.96356.YahooMailNeo@web140605.mail.bf1.yahoo.com>
From: Andrew Purtell <apurtell@apache.org>
Date: Mon, 20 Jan 2014 14:45:52 -0800
Message-ID: 
 <CA+RK=_CGK0WDABcmRH3AvYpEjj7yFe_Gb6LA1cUB+SQ3rn4JDQ@mail.gmail.com>
Subject: Re: Design review: Secondary index support through coprocess
To: "dev@hbase.apache.org" <dev@hbase.apache.org>,
 lars hofhansl <larsh@apache.org>
Content-Type: multipart/alternative; boundary=089e013cbfeac5b4b704f06ea9b4

--089e013cbfeac5b4b704f06ea9b4
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Or don't do blocking I/O in the context of the RPC handler thread. Queue
the work and let the handler return.


On Mon, Jan 20, 2014 at 1:54 PM, lars hofhansl <larsh@apache.org> wrote:

> Yep. That's my concern too. Would need to configure a generous number of
> handlers to prevent this from happening.
>
> ________________________________
>  From: Vladimir Rodionov <vrodionov@carrieriq.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> Sent: Monday, January 20, 2014 11:57 AM
> Subject: RE: Design review: Secondary index support through coprocess
>
> >>Yes, the coprocessors potentially cross RS boundaries.
>
> The open path to the disaster. Inter region RPCs in coprocessors may
> result in periodic cluster - wide deadlocks
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
>
> From: James Taylor [jtaylor@salesforce.com]
> Sent: Monday, January 20, 2014 11:39 AM
> To: dev@hbase.apache.org
> Subject: Re: Design review: Secondary index support through coprocess
>
> Yes, the coprocessors potentially cross RS boundaries. No, the index is n=
ot
> co-located with the main table. Take a look at the link I sent as that
> should be able to answer a lot of questions.
>
> Thanks,
> James
>
>
> On Mon, Jan 20, 2014 at 11:03 AM, Michael Segel
> <michael_segel@hotmail.com>wrote:
>
> > James,
> >
> > Ok=85
> >
> > Its been a while since we talked about this=85
> >
> > While the index is in a separate table, is that table being split and
> > collocated with the main table?
> >
> > If you=92re using the coprocessor to maintain the index, that would imp=
ly
> > you=92re crossing RS boundaries if your index is truly orthogonal.
> >
> > Is this what you=92re doing?
> >
> > On Jan 20, 2014, at 11:32 AM, James Taylor <jtaylor@salesforce.com>
> wrote:
> >
> > > Mike,
> > > Yes, you're mistaken:
> > > - secondary indexes in Phoenix are orthogonal to the base table.
> They're
> > in
> > > a separate table (
> > > http://phoenix.incubator.apache.org/secondary_indexing.html).
> > > - Phoenix has joins. They're in our master branch with a release
> > scheduled
> > > for next month
> > > - numeric strings? Not a use case for indexing numeric data? Have you
> > ever
> > > seen a number used as an ID?
> > > Thanks,
> > > James
> > >
> > >
> > > On Mon, Jan 20, 2014 at 8:50 AM, Michael Segel <
> > michael_segel@hotmail.com>wrote:
> > >
> > >> Indexes tend to be orthogonal to the base table, not to mention if
> > you=92re
> > >> using an inverted table for an index, your index table would be much
> > >> thinner than your base table.
> > >>
> > >> Having said that, the solution proposed by Yu, Taylor and others onl=
y
> > >> works if you want to use the index to help on server side filtering
> and
> > >> misses the boat on the larger and broader picture of improving query
> > >> optimization and joins.
> > >>
> > >> HINT: Unless I am mistaken=85 until you treat the index as orthogona=
l to
> > the
> > >> base table, you will always lag performance of traditional MPP DWs
> like
> > >> Informix XPS. (Now part of IBM=92s IM pillar )
> > >>
> > >> In addition, until you fix coprocessors in general, you will have
> > >> scalability and performance issues.
> > >> (Note that you can write a coprocessor to create a sandbox and
> separate
> > >> the co-process from the RS jvm, however it would be better if it wer=
e
> > part
> > >> of the underlying coprocessor code. )
> > >>
> > >> The current implementation makes joins worthless.
> > >> (Note that in prior discussions,  Phoenix doesn=92t do joins=85)
> > >> Here=92s why:
> > >> In order to do a join, if you use the proposed index, you have to
> first
> > >> reduce each index in to a single, sort ordered set.  Then you can ta=
ke
> > the
> > >> intersection of the index result sets.  The final set would be in so=
rt
> > >> order and a subset of the total rows. You can then fetch the rows an=
d
> > still
> > >> do a server side filter before returning the ultimate result set.
> > >>
> > >> Its that first step of reducing each result set in to a single sort
> > >> ordered set that takes a lot of effort.
> > >>
> > >>
> > >> On a side note=85. there=92s been some mention of ordering floats. A=
gain,
> > just
> > >> a word of caution=85 there isn=92t a really strong use case for inde=
xing
> > >> numeric data types. period.  And to be very, very clear, there is a
> > >> distinction between numeric strings and numeric data types.
> > >>
> > >> -Mike
> > >>
> > >> PS. Because of my role as a consultant, I am very, very limited in
> what
> > I
> > >> can say and contribute. I don=92t own my work product, my clients do=
.
> Take
> > >> what I say with a grain of salt.  I=92m just a skinny little boy fro=
m
> > >> Cleveland Ohio, come to chase your beers and drink your women=85 ;-)
> > >>
> > >> On Jan 9, 2014, at 10:48 AM, James Taylor <jtaylor@salesforce.com>
> > wrote:
> > >>
> > >>> IMHO, it would be valuable if the design considered both a global
> > >>> indexing solution and a local indexing solution. Both are useful in
> > >>> different circumstances. The global indexing design plus the
> > >>> application integration points could be derived from Jesse's work
> with
> > >>> his reference implementation in Phoenix - the global indexing code
> has
> > >>> no Phoenix dependencies and clearly defined integration points.
> > >>>
> > >>> Thanks,
> > >>> James
> > >>>
> > >>> On Jan 9, 2014, at 6:36 AM, Jesse Yates <jesse.k.yates@gmail.com>
> > wrote:
> > >>>
> > >>>> Yes, that was a big concern I had as well.
> > >>>>
> > >>>> It's not clear how that will work with a large number of indexes; =
if
> > >> people
> > >>>> have one index, they will want more than one. To not plan for that
> > seems
> > >>>> like an incomplete implementation to me. In a horizontally scalabl=
e
> > >> system
> > >>>> like HBase, lots of buddy region isn't going to work out well..*
> Once
> > we
> > >>>> have regions that cannot be collocated, the extra RPC time starts =
to
> > be
> > >> the
> > >>>> biggest factor (as the doc points out) and we are back to what
> Phoenix
> > >> is
> > >>>> already doing**.
> > >>>>
> > >>>> But I'm probably missing something here in what makes it different=
?
> > >>>>
> > >>>> For folks that haven't been following the issue some high-level "h=
ow
> > it
> > >> all
> > >>>> kinda works" would be helpful from the championing commiters;
> that's a
> > >> long
> > >>>> doc to get through and grok :). How similar is this to the work
> > >> currently
> > >>>> by the existing indexing implementations (huawei, Phoenix, ngdata)=
?
> > The
> > >> doc
> > >>>> doesn't really nail down the interactions, but instead just right =
in
> > >> after
> > >>>> describing why SI should be added.
> > >>>>
> > >>>> Agree this would be super useful, but don't want to waste too much
> > work
> > >>>> reinventing the wheel or doing the wrong thing. further, this impl
> > >> quickly
> > >>>> starts to lead down the query optimization path, which get HBase
> away
> > >> from
> > >>>> its core "be a great byte store".
> > >>>>
> > >>>> Like I said, I'm all for secondary indexes in HBase and think this
> is
> > a
> > >>>> great push. I don't mean to rain on any parades.
> > >>>>
> > >>>> - jesse
> > >>>>
> > >>>> * but a smart way to specify region collocation? That I can get
> behind
> > >> as
> > >>>> it would unify a couple different indexing impls (e.g Phoenix woul=
d
> > >>>> consider using it to help make indexing faster - RPCs do suck).
> > >>>>
> > >>>> ** for instance, the doc talks about how to implement indexing for
> > >>>> floats... That might be a default impl, but for use cases like
> Phoenix
> > >> this
> > >>>> would break all our current encodings. We handled this is the
> indexing
> > >> impl
> > >>>> by making the builder pluggable for different use cases to support
> > >>>> different encodings. I feel like a lot of the code for this kind o=
f
> SI
> > >>>> impl is already in Phoenix and has been working and fast for sever=
al
> > >> months
> > >>>> now; it's surprisingly tricky, especially with the delete cases an=
d
> > time
> > >>>> stamp manipulation issues.
> > >>>>
> > >>>>
> > >>>> On Thursday, January 9, 2014, Sudarshan Kadambi (BLOOMBERG/ 731
> LEXIN)
> > >>>> wrote:
> > >>>>
> > >>>>> Could you explain how the 1-1 association between user and index
> > table
> > >>>>> regions is maintained. I wasn't able to understand fully from the
> > >> document.
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>> From: Ted Yu <dev@hbase.apache.org>
> > >>>>> To: dev@hbase.apache.org
> > >>>>> At: Jan 8, 2014 3:41:40 PM
> > >>>>>
> > >>>>> Hi,
> > >>>>> Secondary index support is a frequently requested feature.
> > >>>>>
> > >>>>> Please find the updated design doc here:
> > >>>>>
> > >>>>>
> > >>
> >
> https://issues.apache.org/jira/secure/attachment/12621909/SecondaryIndex%=
20Design_Updated_2.pdf
> > >>>>>
> > >>>>> HBASE-9203 is the umbrella JIRA.
> > >>>>>
> > >>>>> Implementation patch was attached to HBASE-10222
> > >>>>>
> > >>>>> Thanks to Rajesh who works on this feature.
> > >>>>>
> > >>>>> Cheers
> > >>>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> -------------------
> > >>>> Jesse Yates
> > >>>> @jesse_yates
> > >>>> jyates.github.com
> > >>>
> > >>
> > >>
> >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to =
be
> read only by the individual or entity to whom this message is addressed. =
If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any for=
m,
> is strictly prohibited.  If you have received this message in error, plea=
se
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>


--=20
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

--089e013cbfeac5b4b704f06ea9b4--