hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques <whs...@gmail.com>
Subject Re: Hbase filter-SubstringComparator vs full text search indexing
Date Mon, 10 Sep 2012 16:41:44 GMT
Two cents below...

On Mon, Sep 10, 2012 at 7:24 AM, Shengjie Min <kelvin.msj@gmail.com> wrote:

> In my case, I have all the log events stored in HDFS/hbase in this format:
> timestamp | priority | category | message body
> Given I have only 4 fields here, that limits my queries to only against
> these four. I am thinking about more advanced search like full text search
> the message body. well, mainly substring query against message body.
>    1.
>    Has anybody tried to use Hbase SubstringComparator? How does it perform,
>    with reasonable huge amount of data, can it still provide us the real
> time
>    response capability?

Probably not if "huge" is sufficiently large.  Since HBase only stores data
indexed by the primary row key, any other criteria search requires a full
scan of all data.

>    2.
>    In my case, does it make more sene to use a proper full text search
>    engine(lucene/solr/elasticsearch) to index the message body, does that
>    sound like a better idea?

Often yes.  For big data especially, this is where ElasticSearch accels.

> would be great someone experienced can share some stories here.
> -Shengjie Min

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message