hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioannis Konstantinou <ik...@cslab.ntua.gr>
Subject Re: Inverted word index...
Date Mon, 17 May 2010 17:22:09 GMT
Hi,

you can also read the following paper 
http://www.cslab.ntua.gr/~ikons/distributed_indexing_of_webscale_datasets_for_the_cloud_mdac_2010_cr.pdf

where we present an inverted index system based on hbase (both the index 
and the content is served through hbase, and indexing is performed 
through mapreduce hadoop functions)

στις 17/5/2010 6:44 μμ, O/H Jonathan Gray έγραψε:
> Kevin,
>
> You would want to make your row keys the words.
>
> HBase defines it's tablets (called Regions) by the startRow and endRow.  So as you say,
a given region may contain "ro to ru".  Looking up the word "round" would use that region.
 This is handled automatically by the META table.
>
> For a refresher on these concepts, check out the BigTable paper.  There have also been
some discussions about inverted word indexes on this mailing list though I don't have links.
>
> JG
>
>    
>> -----Original Message-----
>> From: Kevin Apte [mailto:technicalarchitect2007@gmail.com]
>> Sent: Monday, May 17, 2010 1:07 AM
>> To: hbase-user@hadoop.apache.org
>> Subject: Inverted word index...
>>
>>      Consider a search system with an inverted word index- in other
>> words, an
>> index which points to document location- with these columns- word,
>> document
>> ID and possibly timestamp.
>>
>> Given a word, how will I know which tablet to scan to find all Document
>> IDs,
>> with the given word.
>>
>> If you are indexing a large database - say 50 TB, then each word may be
>> split across multiple tablets. There may be hundreds  of such tablets
>> each
>> with a large number of SSTables  to store the index. How will I know
>> which
>> tablet to search for?  Is there a master index that specifies which
>> tablet
>> has words with range say "ro to ru"  ?    Or do I have to lookup Bloom
>> Filters for every tablet?
>>
>> Kevin
>>      

-- 
Ioannis Konstantinou
Research Associate, Computing Systems Laboratory
National Technical University of Athens
phone: +30 2107721544(internal 421)
mobile: +30 6945992906
Web: http://www.cslab.ntua.gr/~ikons


Mime
View raw message