hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Xinli <chen.d...@gmail.com>
Subject Re: Schema design: change a primary key on a row?
Date Mon, 14 Sep 2009 15:24:29 GMT
Get it... Major compaction does the cleanups.

Thanks

2009/9/14 stack <stack@duboce.net>

> You can configure how many versions of a cell hbase should keep when you
> set
> up your table's schema.  For example, if you set your table to only keep 3
> versions, then on the next major compaction (default every 24 hours),
> versions in excess of 3 will be let go.
> St.Ack
>
>
> On Mon, Sep 14, 2009 at 2:32 AM, Chen Xinli <chen.daqi@gmail.com> wrote:
>
> > Hi ,
> >
> > As there is only insertion in hbase, how does hbase clean garbage data?
> >
> > I will have a table storing several hundred million webpages, updation is
> > done for several million pages per day. Will there be any problem?
> >
> > Thanks
> >
> >
> >
> > 2009/9/3 Jonathan Gray <jlist@streamy.com>
> >
> > > Kevin,
> > >
> > > Not sure I follow the use case 100% but I think you're on the right
> > track.
> > >  There are no UPDATES or mutations of any kind in HBase, only INSERTS.
>  A
> > > delete is actually the insertion of a DELETE record.
> > >
> > > One thing to be cautious of... There can be indeterminate behavior if
> you
> > > are manually setting the version timestamps of your cells while doing
> > > row/family deletes.  If you don't manually set the timestamp (you have
> > stamp
> > > in the key so I'm thinking you don't), then you don't need to worry
> about
> > > it.
> > >
> > > JG
> > >
> > >
> > > Kevin Peterson wrote:
> > >
> > >> I think that it is not possible change the primary key of a row, and I
> > >> need
> > >> to copy any data I want over to a row with the new key and then delete
> > the
> > >> old one, but I wanted to check.
> > >>
> > >> I'm planning on creating my table storing spidered blog content
> building
> > >> the
> > >> primary key from the timestamp of when an article was posted and our
> > >> unique
> > >> article key. This seems the right approach because it matches our
> access
> > >> pattern when processing large amounts of data. The reason I need to be
> > >> able
> > >> to change the primary key is when we get an item from multiple sources
> > >> (i.e.
> > >> maybe we picked it up from digg and directly from the RSS feed) we
> don't
> > >> always favor the first one we downloaded and sometimes we see
> different
> > >> dates.
> > >>
> > >> Does deleting the row and reinserting sound like the right approach?
> > >>
> > >> (If it matters, I'm playing with 0.20 RC2 right now.)
> > >>
> > >>
> >
> >
> > --
> > Best Regards,
> > Chen Xinli
> >
>



-- 
Best Regards,
Chen Xinli

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message