hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Status of HBASE-3529 (Add search to HBase)?
Date Fri, 21 Sep 2012 00:28:13 GMT
I like the approach of building Lucene indexes for HBase data via a
coprocessor. However, the requirement (for good performance) of mmap
of HDFS blocks from the local filesystem, presupposing regionserver
and datanode colocation, presupposing short circuit local access,
presupposing an HDFS API modification (that was vetoed), is at issue
here. It seems we have to do something else. How can HBase provide
index data to Lucene such that it isn't a massive layering violation?
Maybe the Lucene 4 Codec and CodecProvider interfaces? (I'm not all
that familiar with Lucene internals, so big caveat there.)

Indeed Jason put a lot of work into the HBASE-3529 patch, and it is a
shame we couldn't commit the result.

On Thu, Sep 20, 2012 at 5:15 PM, Otis Gospodnetic
<otis.gospodnetic@gmail.com> wrote:
> I agree with Stack.  I liked that whole approach and it's a shame it
> didn't get committed after all the work Jason put into it.
> Otis
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
> On Thu, Sep 20, 2012 at 5:32 PM, Stack <stack@duboce.net> wrote:
>> On Thu, Sep 20, 2012 at 12:43 PM, Andrew Purtell <apurtell@apache.org> wrote:
>>> The issue with the patch on HBASE-3529 is it relies on modifications
>>> to HDFS that the author of HBASE-3529 proposed to the HDFS project as
>>> https://issues.apache.org/jira/browse/HDFS-2004. The proposal was
>>> vetoed. Therefore, further progress on HBASE-3529 as currently
>>> implemented is not possible.
>> Jason's approach had much merit (IMO).  It warrants study at least.
>> Though the indices were written to HDFS, Jason had it so lucene was
>> getting local filesystem access by going via the local read
>> short-circuit facility [1].  Being able to do this made it so he got
>> close to native speeds querying the "HDFS-based" indices.  When Jason
>> left it -- he had to get a real job unfortunately -- he was blocked on
>> what to do when a region moved.  He wanted to be able to be able to
>> immediately pull the indices local on region reopen.  The HDFS fellas
>> who commented in the issue cited by Andrew above thought it a little
>> dodgy adding API for this special case.
>> If you wanted to follow in Jasons footsteps, lets chat.
>> St.Ack
>> 1. http://hbase.apache.org/book.html#perf.hdfs.configs

Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

View raw message