hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sreejith P. K." <sreejit...@nesote.com>
Subject habse schema design and retrieving values through REST interface
Date Tue, 15 Mar 2011 17:19:25 GMT
Hello experts,

I have a scenario as follows,
I need to maintain a huge table for a 'web crawler' project in HBASE.
Basically it contains thousands of keywords and for each keyword i need to
maintain a list of urls (it again will count in thousands). Corresponding to
each url, i need to store a number, which will in turn resemble the priority
value the keyword holds.
Let me explain you a bit, Suppose i have a keyword 'united states', i need
to store about ten thousand urls corresponding to that keyword. Each keyword
will be holding a priority value which is an integer. Again i have thousands
of keywords like that. The rare thing about this is i need to do the project
in PHP.

I have configured a hadoop-hbase cluster consists of three machines. My plan
was to design the schema by taking the keyword as 'row key'. The urls i will
keep as column family. The schema looked fine at first. I have done a lot of
research on how to retrieve the url list if i know the keyword. Any ways i
managed a way out by preg-matching the xml data out put using the url
http://localhost:8080/tablename/rowkey (REST interface i used). It also
works fine if the url list has a limited number of urls. When it comes in
thousands, it seems i cannot fetch the xml data itself!
Now I am in a do or die situation. Please correct me if my schema design
needs any changes (I do believe it should change!) and please help me up to
retrieve the column family values (urls)
 corresponding to each row-key in an efficient way. Please guide me how i
can do the same using PHP-REST interface.
Thanks in advance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message