hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Xinli <chen.d...@gmail.com>
Subject Re: Schema design: change a primary key on a row?
Date Mon, 14 Sep 2009 09:32:29 GMT
Hi ,

As there is only insertion in hbase, how does hbase clean garbage data?

I will have a table storing several hundred million webpages, updation is
done for several million pages per day. Will there be any problem?

Thanks



2009/9/3 Jonathan Gray <jlist@streamy.com>

> Kevin,
>
> Not sure I follow the use case 100% but I think you're on the right track.
>  There are no UPDATES or mutations of any kind in HBase, only INSERTS.  A
> delete is actually the insertion of a DELETE record.
>
> One thing to be cautious of... There can be indeterminate behavior if you
> are manually setting the version timestamps of your cells while doing
> row/family deletes.  If you don't manually set the timestamp (you have stamp
> in the key so I'm thinking you don't), then you don't need to worry about
> it.
>
> JG
>
>
> Kevin Peterson wrote:
>
>> I think that it is not possible change the primary key of a row, and I
>> need
>> to copy any data I want over to a row with the new key and then delete the
>> old one, but I wanted to check.
>>
>> I'm planning on creating my table storing spidered blog content building
>> the
>> primary key from the timestamp of when an article was posted and our
>> unique
>> article key. This seems the right approach because it matches our access
>> pattern when processing large amounts of data. The reason I need to be
>> able
>> to change the primary key is when we get an item from multiple sources
>> (i.e.
>> maybe we picked it up from digg and directly from the RSS feed) we don't
>> always favor the first one we downloaded and sometimes we see different
>> dates.
>>
>> Does deleting the row and reinserting sound like the right approach?
>>
>> (If it matters, I'm playing with 0.20 RC2 right now.)
>>
>>


-- 
Best Regards,
Chen Xinli

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message