hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: batch reads of columns?
Date Mon, 10 Jan 2011 20:01:09 GMT
That would be ways to do it yeah, definitely try it out.

J-D

On Mon, Jan 10, 2011 at 11:27 AM, Hiller, Dean  (Contractor)
<dean.hiller@broadridge.com> wrote:
> Oh, so basically I can do foreign keys in two ways then
>
> 1. account1 =
> {column name="acc1", column fk1="activity1", column fk2="activity2", etc. etc}
>
> 2. Or I could basically do
> Account1-fk1= {column fk="activity1"}
> Account1-fk2= {column fk="activity2"}
> Etc. etc.
>
> Correct?
>
> Is there another way to represent relationships that I might be missing or does it basically
all boil down to those two strategies?
>
> Thanks,
> Dean
>
>
>
> -----Original Message-----
> From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
> Sent: Monday, January 03, 2011 4:31 PM
> To: user@hbase.apache.org
> Subject: Re: batch reads of columns?
>
> I would be tempted to get a taller table table instead of a very very
> wide one, scanning a lot of rows is often easier to use when
> manipulating millions of cells instead of a single Get.
>
> J-D
>
> On Mon, Dec 27, 2010 at 10:12 PM, Hiller, Dean  (Contractor)
> <dean.hiller@broadridge.com> wrote:
>> I am about to do a bunch of Puts with
>>
>>
>>
>> int lastcolVal = //get count of columns somehow I think;  (How do I get
>> the column count of a column family from a certain row?)
>>
>> for(int j = 0; j < 10; j++) {
>>
>>    Put put = new Put("activities", lastcolVal, activityId[j]);
>>
>>    context.write(accountNo, put);
>>
>> }
>>
>>
>>
>> I am looking at the source code of Get.java and trying to read in 100
>> columns, then process, discard, read in next 100 records, process,
>> etc.(ie. Batching like in hibernate so I don't blow up the memory).  I
>> guess I could read in one at a time...is that expensive(I would tend to
>> think so for very large sets)?
>>
>>
>>
>> If I have an account which has activity_id's as columns and I could have
>> let's say 2 billion activities on one account, is there a way to batch
>> read in the columns from the column family so I don't blow up the
>> memory?  (ie. Let's say 4 gig RAM and I think 2 billion ints would be
>> about 8 gig)
>>
>>
>>
>> To be honest, that for loop is a little of a lie....as we get activites,
>> we actually will need to insert them so that they are in order by some
>> kind of date...I am not sure how I am going to do that yet(I definitely
>> don't want to grab 1 billion ids and sort them each time we reprocess).
>>
>>
>>
>> Thanks,
>>
>> Dean
>>
>>
>> This message and any attachments are intended only for the use of the addressee and
>> may contain information that is privileged and confidential. If the reader of the
>> message is not the intended recipient or an authorized representative of the
>> intended recipient, you are hereby notified that any dissemination of this
>> communication is strictly prohibited. If you have received this communication in
>> error, please notify us immediately by e-mail and delete the message and any
>> attachments from your system.
>>
>>
> This message and any attachments are intended only for the use of the addressee and
> may contain information that is privileged and confidential. If the reader of the
> message is not the intended recipient or an authorized representative of the
> intended recipient, you are hereby notified that any dissemination of this
> communication is strictly prohibited. If you have received this communication in
> error, please notify us immediately by e-mail and delete the message and any
> attachments from your system.
>
>

Mime
View raw message