hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From y_823...@tsmc.com
Subject Re: Katta for secondary index?
Date Tue, 23 Jun 2009 08:35:01 GMT
Hi Tim,

Using map/red to Build Table Index , will it output a index file in HDFS?
How to use it with efficiency while it becames very large?
Will it be a bottleneck while many parallel programs access that large
index file?
Any ideas?

Fleming




                                                                                         
                                                            
                      tim robertson                                                      
                                                            
                      <timrobertson100@        To:      hbase-user@hadoop.apache.org  
                                                               
                      gmail.com>               cc:      (bcc: Y_823910/TSMC)          
                                                               
                                               Subject: Re: Katta for secondary index?   
                                                            
                      2009/06/23 04:13                                                   
                                                            
                      PM                                                                 
                                                            
                      Please respond to                                                  
                                                            
                      hbase-user                                                         
                                                            
                                                                                         
                                                            
                                                                                         
                                                            




Hi Fleming

I am pretty much a novice at HBase, but I have asked a similar
question a while ago - the question was whether to to put the data in
the Lucene index or to index the keys only and then get the data with
a series of getByKey(...) operations.  It seems there are no hard and
fast rules for this, so I think it is worth trying what you propose.
It is certainly what we are playing with at the moment, but it is not
live.

Cheers,

Tim

2009/6/23  <y_823910@tsmc.com>:
> Hello Tim,
>
> I would like to do queries by range(maybe by date) or specific family
> column value.
> Build these ?family column (as index column) ?with primary key mapping
that
> I can use
> these ?family column value to locate its primary key, then I can use
these
> key to query HBase.
> Is it the right way if I try to use ?BuildTableIndex?
>
> Fleming
>
>
>
>
> ? ? ? ? ? ? ? ? ? ? ?tim robertson
> ? ? ? ? ? ? ? ? ? ? ?<timrobertson100@ ? ? ? ?To: ? ? ?hbase-user@hadoop.apache.org

> ? ? ? ? ? ? ? ? ? ? ?gmail.com> ? ? ? ? ? ? ? cc: ? ? ?(bcc:
Y_823910/TSMC)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Subject: Re: Katta for
secondary index?
> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 03:45
> ? ? ? ? ? ? ? ? ? ? ?PM
> ? ? ? ? ? ? ? ? ? ? ?Please respond to
> ? ? ? ? ? ? ? ? ? ? ?hbase-user
>
>
>
>
>
>
> What kind of searches are you doing with the secondary indexes? ?Will
> it be range queries for example or simply "give me all the records for
> this key"?
>
>
>
> On Tue, Jun 23, 2009 at 9:44 AM, tim robertson<timrobertson100@gmail.com>
> wrote:
>> For build table index:
>>
>> ? ? ? ? ? ? ? ?BuildTableIndex bti = new BuildTableIndex();
>> ? ? ? ? ? ? ? ?JobConf conf = new JobConf(TestBuildLucene.class);
>> ? ? ? ? ? ? ? ?conf = bti.createJob(conf, 1, 1, "/tmp/lucene-hbase",
> "occurrence",
>> "raw:CatalogueNo");
>> ? ? ? ? ? ? ? ?try {
>> ? ? ? ? ? ? ? ? ? ? ? ?long time = System.currentTimeMillis();
>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Starting the job
> input[occurrence]
>> output[/tmp/lucene-hbase]");
>> ? ? ? ? ? ? ? ? ? ? ? ?JobClient.runJob(conf);
>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Finished in " +
>> (1+System.currentTimeMillis()-time)/1000 + " secs!");
>> ? ? ? ? ? ? ? ?} catch (IOException e) {
>> ? ? ? ? ? ? ? ? ? ? ? ?e.printStackTrace();
>> ? ? ? ? ? ? ? ?}
>>
>>
>> Cheers
>> Tim
>>
>>
>>
>>
>> On Tue, Jun 23, 2009 at 9:39 AM, <y_823910@tsmc.com> wrote:
>>> Hi,
>>>
>>> Is there any code snippet of how to use BuildTableIndex and
> IndexedTable?
>>> Thank you.
>>>
>>> Fleming
>>>
>>>
>>>
>>>
>>>
>>> ? ? ? ? ? ? ? ? ? ? ?saint.ack@gmail.c
>>> ? ? ? ? ? ? ? ? ? ? ?om
> To: ? ? ?hbase-user@hadoop.apache.org
>>> ? ? ? ? ? ? ? ? ? ? ?Sent by: ? ? ? ? ? ? ? ? cc: ? ? ?(bcc:
> Y_823910/TSMC)
>>> ? ? ? ? ? ? ? ? ? ? ?saint.ack@gmail.c ? ? ? ?Subject: Re: Katta for
> secondary index?
>>> ? ? ? ? ? ? ? ? ? ? ?om
>>>
>>>
>>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 01:39
>>> ? ? ? ? ? ? ? ? ? ? ?PM
>>> ? ? ? ? ? ? ? ? ? ? ?Please respond to
>>> ? ? ? ? ? ? ? ? ? ? ?hbase-user
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Jun 22, 2009 at 5:46 PM, <y_823910@tsmc.com> wrote:
>>>
>>>> Hi there,
>>>>
>>>> HBase access data only by key, right?
>>>> Anybody use HBase + Katta(for secondary index)? Does it work?
>>>
>>>
>>>
>>> Katta works but its just a means of distributing lucene indices. ?You
> need
>>> to make the indices first. ?You've checked out the BuildTableIndex
>>> mapreduce
>>> job in hbase? ?It indexes table contents. ?The index is sharded by the
>>> number of reducers you run. ?Perhaps you can have Katta deploy this
> product
>>> for you? ?Perhaps the indices made are not what you want for secondary
>>> lookups but you could adapt BuildTableIndex?
>>>
>>> Does the table change frequently? ?A batch job to redo the index is OK
> with
>>> you? ?In TRUNK you could run a scan that only found records created
> after a
>>> certain date so you could add incremental indices and then do the full
>>> build
>>> of the index at some lesser frequency.
>>>
>>> There is also the experimental tableindexed subclass of hbase that will
>>> keep
>>> up a secondary table as an index using transactional hbase so insert
> into
>>> primary and secondary table is done as a single transaction (Its not
yet
> in
>>> trunk but should be here soon).
>>>
>>> St.Ack
>>>
>>>
>>>> We just want to transfer part of our Oracle table data to HBase
>>>> for multi parallel computing.
>>>> Any suggestions would be appreciated!
>>>> Thank you
>>>>
>>>> Fleming
>>>>
>>>>
>>>
>
---------------------------------------------------------------------------
>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>>>> ?This email communication (and any attachments) is proprietary
>>> information
>>>> ?for the sole use of its
>>>> ?intended recipient. Any unauthorized review, use or distribution by
>>> anyone
>>>> ?other than the intended
>>>> ?recipient is strictly prohibited. ?If you are not the intended
>>> recipient,
>>>> ?please notify the sender by
>>>> ?replying to this email, and then delete this email and any copies of
> it
>>>> ?immediately. Thank you.
>>>>
>>>>
>>>
>
---------------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
?---------------------------------------------------------------------------

>
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>>> ?This email communication (and any attachments) is proprietary
> information
>>> ?for the sole use of its
>>> ?intended recipient. Any unauthorized review, use or distribution by
> anyone
>>> ?other than the intended
>>> ?recipient is strictly prohibited. ?If you are not the intended
> recipient,
>>> ?please notify the sender by
>>> ?replying to this email, and then delete this email and any copies of
it
>>> ?immediately. Thank you.
>>>
?---------------------------------------------------------------------------

>
>>>
>>>
>>>
>>>
>>
>
>
>
>
> ?---------------------------------------------------------------------------

> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
> ?This email communication (and any attachments) is proprietary
information
> ?for the sole use of its
> ?intended recipient. Any unauthorized review, use or distribution by
anyone
> ?other than the intended
> ?recipient is strictly prohibited. ?If you are not the intended
recipient,
> ?please notify the sender by
> ?replying to this email, and then delete this email and any copies of it
> ?immediately. Thank you.
> ?---------------------------------------------------------------------------

>
>
>
>




 --------------------------------------------------------------------------- 
                                                         TSMC PROPERTY       
 This email communication (and any attachments) is proprietary information   
 for the sole use of its                                                     
 intended recipient. Any unauthorized review, use or distribution by anyone  
 other than the intended                                                     
 recipient is strictly prohibited.  If you are not the intended recipient,   
 please notify the sender by                                                 
 replying to this email, and then delete this email and any copies of it     
 immediately. Thank you.                                                     
 --------------------------------------------------------------------------- 




Mime
View raw message