hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tim robertson <timrobertson...@gmail.com>
Subject Re: Katta for secondary index?
Date Tue, 23 Jun 2009 09:20:56 GMT
Not at all.  I am very interested to hear your results!




2009/6/23  <y_823910@tsmc.com>:
> Tim,
> Thank you very much for your help ^_^
>
> Fleming
>
>
>
>                      tim robertson
>                      <timrobertson100@        To:      hbase-user@hadoop.apache.org
>                      gmail.com>               cc:      (bcc:
Y_823910/TSMC)
>                                               Subject: Re: Katta
for secondary index?
>                      2009/06/23 04:41
>                      PM
>                      Please respond to
>                      hbase-user
>
>
>
>
>
>
> Hi,
>
> Yes it will and then you need to copy it out of HDFS for Lucene to read it.
> If it is a huge index, this is where Katta would be useful, as it will
> deploy across a cluster of lucene machines for you (by copying out of
> HDFS).  I would recommend as a start to build an index of a sample of
> your data and copy it out manually and start up Lucene checking it
> works.  Then try and guess how big the index would be if you did it on
> all your data.
>
> Again - I am pretty novice though...
>
> Cheers
>
> Tim
>
>
> 2009/6/23  <y_823910@tsmc.com>:
>> Hi Tim,
>>
>> Using map/red to Build Table Index , will it output a index file in HDFS?
>> How to use it with efficiency while it becames very large?
>> Will it be a bottleneck while many parallel programs access that large
>> index file?
>> Any ideas?
>>
>> Fleming
>>
>>
>>
>>
>>
>> ? ? ? ? ? ? ? ? ? ? ?tim robertson
>> ? ? ? ? ? ? ? ? ? ? ?<timrobertson100@ ? ? ? ?To: ? ? ?hbase-user@hadoop.apache.org
>
>> ? ? ? ? ? ? ? ? ? ? ?gmail.com> ? ? ? ? ? ? ? cc: ? ? ?(bcc:
> Y_823910/TSMC)
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Subject: Re: Katta for
> secondary index?
>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 04:13
>> ? ? ? ? ? ? ? ? ? ? ?PM
>> ? ? ? ? ? ? ? ? ? ? ?Please respond to
>> ? ? ? ? ? ? ? ? ? ? ?hbase-user
>>
>>
>>
>>
>>
>>
>> Hi Fleming
>>
>> I am pretty much a novice at HBase, but I have asked a similar
>> question a while ago - the question was whether to to put the data in
>> the Lucene index or to index the keys only and then get the data with
>> a series of getByKey(...) operations. ?It seems there are no hard and
>> fast rules for this, so I think it is worth trying what you propose.
>> It is certainly what we are playing with at the moment, but it is not
>> live.
>>
>> Cheers,
>>
>> Tim
>>
>> 2009/6/23 ?<y_823910@tsmc.com>:
>>> Hello Tim,
>>>
>>> I would like to do queries by range(maybe by date) or specific family
>>> column value.
>>> Build these ?family column (as index column) ?with primary key mapping
>> that
>>> I can use
>>> these ?family column value to locate its primary key, then I can use
>> these
>>> key to query HBase.
>>> Is it the right way if I try to use ?BuildTableIndex?
>>>
>>> Fleming
>>>
>>>
>>>
>>>
>>> ? ? ? ? ? ? ? ? ? ? ?tim robertson
>>> ? ? ? ? ? ? ? ? ? ? ?<timrobertson100@ ? ? ? ?To: ? ?
> ?hbase-user@hadoop.apache.org
>>
>>> ? ? ? ? ? ? ? ? ? ? ?gmail.com> ? ? ? ? ? ? ? cc: ? ? ?(bcc:
>> Y_823910/TSMC)
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Subject: Re: Katta for
>> secondary index?
>>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 03:45
>>> ? ? ? ? ? ? ? ? ? ? ?PM
>>> ? ? ? ? ? ? ? ? ? ? ?Please respond to
>>> ? ? ? ? ? ? ? ? ? ? ?hbase-user
>>>
>>>
>>>
>>>
>>>
>>>
>>> What kind of searches are you doing with the secondary indexes? ?Will
>>> it be range queries for example or simply "give me all the records for
>>> this key"?
>>>
>>>
>>>
>>> On Tue, Jun 23, 2009 at 9:44 AM, tim
> robertson<timrobertson100@gmail.com>
>>> wrote:
>>>> For build table index:
>>>>
>>>> ? ? ? ? ? ? ? ?BuildTableIndex bti = new BuildTableIndex();
>>>> ? ? ? ? ? ? ? ?JobConf conf = new JobConf(TestBuildLucene.class);
>>>> ? ? ? ? ? ? ? ?conf = bti.createJob(conf, 1, 1, "/tmp/lucene-hbase",
>>> "occurrence",
>>>> "raw:CatalogueNo");
>>>> ? ? ? ? ? ? ? ?try {
>>>> ? ? ? ? ? ? ? ? ? ? ? ?long time = System.currentTimeMillis();
>>>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Starting the job
>>> input[occurrence]
>>>> output[/tmp/lucene-hbase]");
>>>> ? ? ? ? ? ? ? ? ? ? ? ?JobClient.runJob(conf);
>>>> ? ? ? ? ? ? ? ? ? ? ? ?System.out.println("Finished in " +
>>>> (1+System.currentTimeMillis()-time)/1000 + " secs!");
>>>> ? ? ? ? ? ? ? ?} catch (IOException e) {
>>>> ? ? ? ? ? ? ? ? ? ? ? ?e.printStackTrace();
>>>> ? ? ? ? ? ? ? ?}
>>>>
>>>>
>>>> Cheers
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jun 23, 2009 at 9:39 AM, <y_823910@tsmc.com> wrote:
>>>>> Hi,
>>>>>
>>>>> Is there any code snippet of how to use BuildTableIndex and
>>> IndexedTable?
>>>>> Thank you.
>>>>>
>>>>> Fleming
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ? ? ? ? ? ? ? ? ? ? ?saint.ack@gmail.c
>>>>> ? ? ? ? ? ? ? ? ? ? ?om
>>> To: ? ? ?hbase-user@hadoop.apache.org
>>>>> ? ? ? ? ? ? ? ? ? ? ?Sent by: ? ? ? ? ? ? ? ? cc: ? ? ?(bcc:
>>> Y_823910/TSMC)
>>>>> ? ? ? ? ? ? ? ? ? ? ?saint.ack@gmail.c ? ? ? ?Subject: Re: Katta for
>>> secondary index?
>>>>> ? ? ? ? ? ? ? ? ? ? ?om
>>>>>
>>>>>
>>>>> ? ? ? ? ? ? ? ? ? ? ?2009/06/23 01:39
>>>>> ? ? ? ? ? ? ? ? ? ? ?PM
>>>>> ? ? ? ? ? ? ? ? ? ? ?Please respond to
>>>>> ? ? ? ? ? ? ? ? ? ? ?hbase-user
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 22, 2009 at 5:46 PM, <y_823910@tsmc.com> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> HBase access data only by key, right?
>>>>>> Anybody use HBase + Katta(for secondary index)? Does it work?
>>>>>
>>>>>
>>>>>
>>>>> Katta works but its just a means of distributing lucene indices. ?You
>>> need
>>>>> to make the indices first. ?You've checked out the BuildTableIndex
>>>>> mapreduce
>>>>> job in hbase? ?It indexes table contents. ?The index is sharded by the
>>>>> number of reducers you run. ?Perhaps you can have Katta deploy this
>>> product
>>>>> for you? ?Perhaps the indices made are not what you want for secondary
>>>>> lookups but you could adapt BuildTableIndex?
>>>>>
>>>>> Does the table change frequently? ?A batch job to redo the index is OK
>>> with
>>>>> you? ?In TRUNK you could run a scan that only found records created
>>> after a
>>>>> certain date so you could add incremental indices and then do the full
>>>>> build
>>>>> of the index at some lesser frequency.
>>>>>
>>>>> There is also the experimental tableindexed subclass of hbase that
> will
>>>>> keep
>>>>> up a secondary table as an index using transactional hbase so insert
>>> into
>>>>> primary and secondary table is done as a single transaction (Its not
>> yet
>>> in
>>>>> trunk but should be here soon).
>>>>>
>>>>> St.Ack
>>>>>
>>>>>
>>>>>> We just want to transfer part of our Oracle table data to HBase
>>>>>> for multi parallel computing.
>>>>>> Any suggestions would be appreciated!
>>>>>> Thank you
>>>>>>
>>>>>> Fleming
>>>>>>
>>>>>>
>>>>>
>>>
>>
> ---------------------------------------------------------------------------
>>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>>>>>> ?This email communication (and any attachments) is proprietary
>>>>> information
>>>>>> ?for the sole use of its
>>>>>> ?intended recipient. Any unauthorized review, use or distribution
by
>>>>> anyone
>>>>>> ?other than the intended
>>>>>> ?recipient is strictly prohibited. ?If you are not the intended
>>>>> recipient,
>>>>>> ?please notify the sender by
>>>>>> ?replying to this email, and then delete this email and any copies
of
>>> it
>>>>>> ?immediately. Thank you.
>>>>>>
>>>>>>
>>>>>
>>>
>>
> ---------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>
> ?---------------------------------------------------------------------------
>
>>
>>>
>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>>>>> ?This email communication (and any attachments) is proprietary
>>> information
>>>>> ?for the sole use of its
>>>>> ?intended recipient. Any unauthorized review, use or distribution by
>>> anyone
>>>>> ?other than the intended
>>>>> ?recipient is strictly prohibited. ?If you are not the intended
>>> recipient,
>>>>> ?please notify the sender by
>>>>> ?replying to this email, and then delete this email and any copies of
>> it
>>>>> ?immediately. Thank you.
>>>>>
>>
> ?---------------------------------------------------------------------------
>
>>
>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
> ?---------------------------------------------------------------------------
>
>>
>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>>> ?This email communication (and any attachments) is proprietary
>> information
>>> ?for the sole use of its
>>> ?intended recipient. Any unauthorized review, use or distribution by
>> anyone
>>> ?other than the intended
>>> ?recipient is strictly prohibited. ?If you are not the intended
>> recipient,
>>> ?please notify the sender by
>>> ?replying to this email, and then delete this email and any copies of it
>>> ?immediately. Thank you.
>>>
> ?---------------------------------------------------------------------------
>
>>
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>> ?---------------------------------------------------------------------------
>
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? TSMC PROPERTY
>> ?This email communication (and any attachments) is proprietary
> information
>> ?for the sole use of its
>> ?intended recipient. Any unauthorized review, use or distribution by
> anyone
>> ?other than the intended
>> ?recipient is strictly prohibited. ?If you are not the intended
> recipient,
>> ?please notify the sender by
>> ?replying to this email, and then delete this email and any copies of it
>> ?immediately. Thank you.
>> ?---------------------------------------------------------------------------
>
>>
>>
>>
>>
>
>
>
>
>  ---------------------------------------------------------------------------
>                                                         TSMC
PROPERTY
>  This email communication (and any attachments) is proprietary information
>  for the sole use of its
>  intended recipient. Any unauthorized review, use or distribution by anyone
>  other than the intended
>  recipient is strictly prohibited.  If you are not the intended recipient,
>  please notify the sender by
>  replying to this email, and then delete this email and any copies of it
>  immediately. Thank you.
>  ---------------------------------------------------------------------------
>
>
>
>

Mime
View raw message