hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques <whs...@gmail.com>
Subject Re: Status of HBASE-3529 (Add search to HBase)?
Date Fri, 21 Sep 2012 01:36:28 GMT
The reason I mentioned Blur.io is I thought they implemented a
CodecProvider that was built for write-once HDFS.

The whole layer violation problem is all about performance.  That is the
big question I think people need to seriously ask themselves: does their
particular use case allow substantially poorer performance than a local
index for HDFS benefits?  Would ElasticSearch better solve their problems?
 Many companies I've talked to utilize SSDs to achieve their required QPS.
 If local disks won't work and HDFS is another couple layers of
abstraction, I'm not sure what a Lucene integration will provide without
the localization that Jason originally expected.

Don't get me wrong, I think this will happen.  We've been exploring various
solutions.  I just think that, like many things, the clear target use cases
must be determined so that the feature satisfies someone.    If we're just
rebuilding ElasticSearch, wouldn't a simple Coprocessor connector that
managed communication with ES be simpler and more performant?
 Photobucket's Solbase is also an option if you front it with caching,
maintain large stop lists and don't get beyond 50-100mm docs.



On Thu, Sep 20, 2012 at 5:28 PM, Andrew Purtell <apurtell@apache.org> wrote:

> I like the approach of building Lucene indexes for HBase data via a
> coprocessor. However, the requirement (for good performance) of mmap
> of HDFS blocks from the local filesystem, presupposing regionserver
> and datanode colocation, presupposing short circuit local access,
> presupposing an HDFS API modification (that was vetoed), is at issue
> here. It seems we have to do something else. How can HBase provide
> index data to Lucene such that it isn't a massive layering violation?
> Maybe the Lucene 4 Codec and CodecProvider interfaces? (I'm not all
> that familiar with Lucene internals, so big caveat there.)
>
> Indeed Jason put a lot of work into the HBASE-3529 patch, and it is a
> shame we couldn't commit the result.
>
>
> On Thu, Sep 20, 2012 at 5:15 PM, Otis Gospodnetic
> <otis.gospodnetic@gmail.com> wrote:
> > I agree with Stack.  I liked that whole approach and it's a shame it
> > didn't get committed after all the work Jason put into it.
> >
> > Otis
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Thu, Sep 20, 2012 at 5:32 PM, Stack <stack@duboce.net> wrote:
> >> On Thu, Sep 20, 2012 at 12:43 PM, Andrew Purtell <apurtell@apache.org>
> wrote:
> >>> The issue with the patch on HBASE-3529 is it relies on modifications
> >>> to HDFS that the author of HBASE-3529 proposed to the HDFS project as
> >>> https://issues.apache.org/jira/browse/HDFS-2004. The proposal was
> >>> vetoed. Therefore, further progress on HBASE-3529 as currently
> >>> implemented is not possible.
> >>>
> >>
> >> Jason's approach had much merit (IMO).  It warrants study at least.
> >>
> >> Though the indices were written to HDFS, Jason had it so lucene was
> >> getting local filesystem access by going via the local read
> >> short-circuit facility [1].  Being able to do this made it so he got
> >> close to native speeds querying the "HDFS-based" indices.  When Jason
> >> left it -- he had to get a real job unfortunately -- he was blocked on
> >> what to do when a region moved.  He wanted to be able to be able to
> >> immediately pull the indices local on region reopen.  The HDFS fellas
> >> who commented in the issue cited by Andrew above thought it a little
> >> dodgy adding API for this special case.
> >>
> >> If you wanted to follow in Jasons footsteps, lets chat.
> >> St.Ack
> >>
> >> 1. http://hbase.apache.org/book.html#perf.hdfs.configs
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein (via Tom White)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message