hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saajan <ssangra...@veriskhealth.com>
Subject Re: HBase Design Considerations
Date Tue, 04 May 2010 06:45:07 GMT

Thanks all for your response, very helpful. We feel that we are generally
moving in the right direction with our data design, and our idea about the
client is the same you mentioned.

Jonathan - Our data mostly qualify as fixed number / value and time values,
and we wont require full text search. Time is in the granularity of month,
and mostly spans 4 years. A specific requirement with time is to support
range as well as a single month: a typical search query may be "people in
New york with Diabetes and who had been to Emergency room from Feb to Apr".
Our current design approach matches the one suggested by Edward: have a main
table with multiple column families and Member ID as the row key, and then
create index tables with each searchable key as rowkey (and the row key of
the main table as the value). Our concern is mostly with the number of
searchable columns, redundancy of data and the number of merge operations
(after fetching rowkeys from a large number of index tables) required to
meet our needs.  

Will look more into the idea of SOLR integration


Saajan wrote:
> We are working on a prototype to migrate our healthcare database,
> currently in Oracle, to HBase.
> Our java based web application allows end users to search patients on over
> 50 different criteria through a query builder interface: typical queries
> involve identifying members who match filter conditions on diagnosis,
> procedures, doctors and hospitals, time intervals, employer and so forth.
> The database has records for over 5 million patients for a number of
> years, and is around 10 TB in size. 
> A major design issue we are facing is to allow fast querying in HBase with
> so many searchable columns. We are experimenting with secondary index
> tables, multiple tables etc., but haven't been able to reach a conclusion
> on the way ahead. Expected user response time is up to 4 seconds. 
> Would highly appreciate comments on how HBase is used to support search
> applications and how we can support search / filter across multiple
> criteria in HBase.
> Thanks
> Saajan

View this message in context: http://old.nabble.com/HBase-Design-Considerations-tp28431975p28443657.html
Sent from the HBase User mailing list archive at Nabble.com.

View raw message