hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Status of HBASE-3529 (Add search to HBase)?
Date Fri, 21 Sep 2012 01:51:29 GMT
On Thursday, September 20, 2012, Jacques wrote:

> The reason I mentioned Blur.io is I thought they implemented a
> CodecProvider that was built for write-once HDFS.


Cool. Is it open sourced anywhere do you know?


> The whole layer violation problem is all about performance.  That is the
> big question I think people need to seriously ask themselves: does their
> particular use case allow substantially poorer performance than a local
> index for HDFS benefits?


That is a good question.

But, what stopped progress here is a veto of HDFS side changes needed for
the implementation to get that performance.

 If we're just
> rebuilding ElasticSearch, wouldn't a simple Coprocessor connector that
> managed communication with ES be simpler and more performant?


This is what I recommended one of our internal groups pursue exactly - use
of ES as an indexing service, not just for HBase data (hooked up via CPs)
but also for any app that would like to use it directly.

 Photobucket's Solbase is also an option if you front it with caching,
> maintain large stop lists and don't get beyond 50-100mm docs.
>
>
>
> On Thu, Sep 20, 2012 at 5:28 PM, Andrew Purtell <apurtell@apache.org<javascript:;>>
> wrote:
>
> > I like the approach of building Lucene indexes for HBase data via a
> > coprocessor. However, the requirement (for good performance) of mmap
> > of HDFS blocks from the local filesystem, presupposing regionserver
> > and datanode colocation, presupposing short circuit local access,
> > presupposing an HDFS API modification (that was vetoed), is at issue
> > here. It seems we have to do something else. How can HBase provide
> > index data to Lucene such that it isn't a massive layering violation?
> > Maybe the Lucene 4 Codec and CodecProvider interfaces? (I'm not all
> > that familiar with Lucene internals, so big caveat there.)
> >
> > Indeed Jason put a lot of work into the HBASE-3529 patch, and it is a
> > shame we couldn't commit the result.
> >
> >
> > On Thu, Sep 20, 2012 at 5:15 PM, Otis Gospodnetic
> > <otis.gospodnetic@gmail.com <javascript:;>> wrote:
> > > I agree with Stack.  I liked that whole approach and it's a shame it
> > > didn't get committed after all the work Jason put into it.
> > >
> > > Otis
> > > Search Analytics - http://sematext.com/search-analytics/index.html
> > > Performance Monitoring - http://sematext.com/spm/index.html
> > >
> > >
> > > On Thu, Sep 20, 2012 at 5:32 PM, Stack <stack@duboce.net<javascript:;>>
> wrote:
> > >> On Thu, Sep 20, 2012 at 12:43 PM, Andrew Purtell <apurtell@apache.org<javascript:;>
> >
> > wrote:
> > >>> The issue with the patch on HBASE-3529 is it relies on modifications
> > >>> to HDFS that the author of HBASE-3529 proposed to the HDFS project
as
> > >>> https://issues.apache.org/jira/browse/HDFS-2004. The proposal was
> > >>> vetoed. Therefore, further progress on HBASE-3529 as currently
> > >>> implemented is not possible.
> > >>>
> > >>
> > >> Jason's approach had much merit (IMO).  It warrants study at least.
> > >>
> > >> Though the indices were written to HDFS, Jason had it so lucene was
> > >> getting local filesystem access by going via the local read
> > >> short-circuit facility [1].  Being able to do this made it so he got
> > >> close to native speeds querying the "HDFS-based" indices.  When Jason
> > >> left it -- he had to get a real job unfortunately -- he was blocked on
> > >> what to do when a region moved.  He wanted to be able to be able to
> > >> immediately pull the indices local on region reopen.  The HDFS fellas
> > >> who commented in the issue cited by Andrew above thought it a little
> > >> dodgy adding API for this special case.
> > >>
> > >> If you wanted to follow in Jasons footsteps, lets chat.
> > >> St.Ack
> > >>
> > >> 1. http://hbase.apache.org/book.html#perf.hdfs.configs
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein (via Tom White)
> >
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message