hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Agarwal, Saurabh " <saurabh.agar...@citi.com>
Subject RE: Hbase and Search Integration
Date Tue, 20 Mar 2012 15:12:07 GMT
Thanks Ryan. Have you created that Solr secondary index as one of Hbase Table?

-----Original Message-----
From: Ryan Tabora [mailto:ratabora@gmail.com]
Sent: Tuesday, March 20, 2012 10:59 AM
To: user@hbase.apache.org
Subject: Re: Hbase and Search Integration

I would suggest when you are loading in the log data you should create a secondary index in
Solr that creates indices based on HBase row key. This is the way we are implementing Solr
search on Hbase in my current project.

Thank you,

On Mar 20, 2012, at 7:27 AM, Imran M Yousuf wrote:

> Hi Saurabh,
> On Tue, Mar 20, 2012 at 8:10 PM, Agarwal, Saurabh
> <saurabh.agarwal@citi.com> wrote:
>> Thanks Imran. Ton of good functionality in Smart CMS.
> There are a lot of functionality, but not that they all need to be
> used at once :). Let me know if you want to give your use a try with
> Smart CMS, I would be glad to help you out.
> Thank you,
> Imran
>> For our search use case, CMS might be overkill. Lily looks good at first glance.
Do anyone has experience?
>> Thanks,
>> Saurabh.
>> -----Original Message-----
>> From: Imran M Yousuf [mailto:imyousuf@gmail.com]
>> Sent: Tuesday, March 20, 2012 9:46 AM
>> To: user@hbase.apache.org
>> Subject: Re: Hbase and Search Integration
>> Hi Saurabh,
>> For integrating HBase and Apache Solr (or any other indexing/search
>> platform) we came up with Smart CMS [1][2] and there is the Lily
>> Project [4] too.
>> We are on the verge of releasing its 0.1 version which we have been
>> testing for an extensive period of time and will be used in production
>> straight away. Smart CMS was designed and developed with a goal
>> of uniting concepts of Objects with (HBase +Solr). IOW, we want to
>> design objects, and Smart CMS will take care of persisting it and
>> making it available for search. Though initially we have chosen
>> Apache Solr as the search engine but it is very easy to plugin any
>> other search engine of our choice, since we expose the integration of
>> search functionality through SPI.
>> A little bit of history of how we came into developing it and what it
>> is currently being used for. We started the development of it as we
>> needed a flexible content management system for an e-Commerce Platform
>> as a Service of ours. As we engrossed ourselves into it we found
>> 'content' to be synonymous to 'Object' in OOP paradigm and we started
>> development around it. As a result now we have a system that can be
>> used both as a traditional Content Management System and as a Content
>> Repository.
>> We used it in as a traditional CMS capacity to manage Pages for the
>> partner websites for our e-Commerce PaaS; i.e. customers can create
>> pages for - product, promotion, store, etc. manage page contents for
>> front page, category page; links associated products, related products
>> etc from UI where the UI is dynamically generated using the content
>> definitions. We also used the CMS for extensive search functionalities
>> such as, full text search, facet search, range search, auto completion
>> etc. For this we access the CMS using its Web Service library, we use
>> Solr directly for advance searches and to access both of them we use a
>> tag library. The flexibility Smart CMS provided us in fact helped us
>> win 2 big customers.
>> We used CMS as a content repository where Smart CMS is actually being
>> used to generate domain/dto, data access layer codes for API/Service
>> layers to use them to persist Java POJOs; i.e.users of it defines a
>> XML we call 'Content Type Definition'. A content type definition is
>> synonymous to an Object Diagram; where we define objects, their
>> inheritance and compositions. This code generation is an approach we
>> took to bypass Java Reflection API and it is done by a Maven Plugin we
>> have written. We have another plugin which helps us start all CMS
>> related applications within Maven so that we can write integration
>> tests on the fly. An example of repository mode is available in our
>> Application Smart Email Queue [3], which is designed to send emails
>> from our PaaS. After proving sustainable performance in this mode,
>> Smart CMS has also been chosen for 4G Telecom Application Server
>> project's database.
>> [1] Smart CMS - http://smart-cms.org
>> [2] Smart CMS Source - https://github.com/SmartITEngineering/smart-cms
>> [3] Smart Email Queue - https://github.com/SmartITEngineering/smart-email-queue
>> [4] http://www.lilyproject.org/lily/index.html
>> We would welcome any feedback, criticism, involvement in Smart CMS. If
>> you have any further queries please feel free to ask them.
>> Thank you,
>> Imran
>> On Tue, Mar 20, 2012 at 7:38 PM, Agarwal, Saurabh
>> <saurabh.agarwal@citi.com> wrote:
>>> Hi,
>>> Has anyone integrated search ( Luence, Solr or Elastic) with HBase?
>>> We are implementing log search functionality using HBase. Through Flume, the
logs from multiple apps are getting streamed into HBase directly.
>>> A very basic use case is to search a keyword for an application for a certain
timeframe ( for example - last hour).
>>> Our row key is app_id:timestamp and all log contents are stored in columns. We
started with Regex filter. It worked but do not provide the consistent result.
>>> Now, we are exploring the index search capability in HBase. Our thought process
is that first create an inverted index table with row key - search documents and column -
the row key of the content table. The search will return all the row keys.
>>> Additional requirement - We would like to limit the results for certain time
frame. Second, we would like to display only limited records in descending time order and
come back for more if user want to see more records.
>>> Let me know if someone has integrated the search with HBase.
>>> Thanks,
>>> Saurabh.
>>> -----Original Message-----
>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>>> Sent: Monday, March 19, 2012 12:33 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: There is no data value information in HLog?
>>> Hi,
>>> Have you noticed this in HLogPrettyPrinter ?
>>>    options.addOption("p", "printvals", false, "Print values");
>>> Looks like you should have specified the above option.
>>> On Mon, Mar 19, 2012 at 7:31 AM, yonghu <yongyong313@gmail.com> wrote:
>>>> Hello,
>>>> I used the $ ./bin/hbase org.apache.hadoop.hbase.regionserver.wal.HLog
>>>> --dump command to check the HLog information. But I can not find any
>>>> data information. The output of my HLog file is looks like follows:
>>>> Sequence 933 from region 85986149309dff24ecf7be4873136f15 in table test
>>>>  Action:
>>>>    row: Udo
>>>>    column: Course:Computer
>>>>    at time: Mon Mar 19 14:09:29 CET 2012
>>>> Sequence 935 from region 85986149309dff24ecf7be4873136f15 in table test
>>>>  Action:
>>>>    row: Udo
>>>>    column: Course:Math
>>>>    at time: Mon Mar 19 14:09:29 CET 2012
>>>> The functionality of HLog is for recovery. But without data value
>>>> information, how can hbase use the information in HLog to do recovery.
>>>> My hbase version is 0.92.0.
>>>> Regards!
>>>> Yong
>> --
>> Imran M Yousuf
>> Entrepreneur & CEO
>> Smart IT Engineering Ltd.
>> Dhaka, Bangladesh
>> Twitter: @imyousuf - http://twitter.com/imyousuf
>> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
>> Mobile: +880-1711402557
> --
> Imran M Yousuf
> Entrepreneur & CEO
> Smart IT Engineering Ltd.
> Dhaka, Bangladesh
> Twitter: @imyousuf - http://twitter.com/imyousuf
> Blog: http://imyousuf-tech.blogs.smartitengineering.com/
> Mobile: +880-1711402557

View raw message