hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: delete using server's timestamp
Date Sat, 22 Jan 2011 01:33:55 GMT
Ah - i see.  I didn't notice the difference between KeyValue.Type.Delete
and KeyValue.Type.DeleteColumn.

Sorry about that,
Matt


On Fri, Jan 21, 2011 at 8:24 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Hi Matt,
>
>
> This call, deleteColumns (plural!!!) when you do not specify a
> timestamp, sends LATEST_TIMESTAMP as you say, but the server uses
> System.currentTimeMilllis and inserts the delete marker - which masks
> ALL previous version for that column.  So it does NOT use
> get-before-delete, the only call that does this is 'deleteColumn'
> (SINGULAR!!)
>
> note the 2 calls are VERY similar, one creates a KV of Type.Delete the
> other of Type.DeleteColumn.
>
> Yes the API is confusing.  If you DO NOT use 'deleteColumn'
> (SINGULAR!), you WONT invoke the Get-before-Delete code.  Stack and I
> both checked the code path, and it's the same as I remember :-)
>
> -ryan
>
>
> On Fri, Jan 21, 2011 at 5:17 PM, Matt Corgan <mcorgan@hotpads.com> wrote:
> > Thanks for the replies.  My table is set to store only one version, but
> I'd
> > probably delete all previous versions to be safe.  I'd therefore use one
> of
> > these 2 methods:
> > - Delete.deleteColumns(byte[] family, byte[]qualifier)
> > - Delete.deleteColumns(byte[] family, byte[]qualifier, long timestamp)
> > The problem is that both have the client generate the timestamp.  If you
> > don't specify it, it uses the HConstants.LATEST_TIMESTAMP which causes
> the
> > get-before-put (10x slowdown in my use case).  If you do specify it,
> which
> > is required because the method takes a primitive long, then you're
> relying
> > on the client's clock to be perfect.  I chose the latter option for
> better
> > performance, but was surprised to see there's not an option to let the
> > server generate the currentTimeMillis, since that is what happens on a
> Put
> > operation.  Not a big deal, but wanted see if there was a technical
> reason
> > behind it or if it's just that nobody's needed that functionality.
> > Thanks again,
> > Matt
> >
> > On Fri, Jan 21, 2011 at 6:41 PM, Bill Graham <billgraham@gmail.com>
> wrote:
> >>
> >> Thanks Ryan, that clears it up.
> >>
> >>
> >> On Fri, Jan 21, 2011 at 3:29 PM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> >> > No, the storage model does not work like that.  The storage model
> >> > revolves around the KeyValue, which is roughly:
> >> >
> >> > rowid/family/qualifier/timestamp/data
> >> >
> >> > and we store sequences of these in sorted order in HFiles.
> >> >
> >> > Note, we store the row with every single version of every column/cell.
> >> >
> >> > Therefore there is no such thing as "removing the bytes that represent
> >> > the actual row key", they are part of every cell, and once those cells
> >> > go away, then so does the row key.
> >> >
> >> > I hope this helps,
> >> > -ryan
> >> >
> >> > On Fri, Jan 21, 2011 at 3:26 PM, Bill Graham <billgraham@gmail.com>
> >> > wrote:
> >> >> I follow the tombstone/compact/delete cycle of the column values, but
> >> >> I'm still unclear of the row key life cycle.
> >> >>
> >> >> Is it that the bytes that represent the actual row key are associated
> >> >> with and removed with each column value? Or are they removed upon
> >> >> compaction when no column values exist for a given row key?
> >> >>
> >> >>
> >> >>
> >> >> On Fri, Jan 21, 2011 at 2:26 PM, Ryan Rawson <ryanobjc@gmail.com>
> >> >> wrote:
> >> >>> Any of the deletes merely insert a 'tombstone' which doesnt delete
> the
> >> >>> data immediately but does mark it so queries no longer return it.
> >> >>>
> >> >>> During the compactions we prune these delete values and they
> disappear
> >> >>> for good.  (Barring other backups of course)
> >> >>>
> >> >>> Because of our variable length storage model, we dont store rows
in
> >> >>> particular blocks and rewrite said blocks, so notions of rows
> >> >>> 'existing' or not, don't event apply to HBase as they do to RDBMS
> >> >>> systems.
> >> >>>
> >> >>> -ryan
> >> >>>
> >> >>> On Fri, Jan 21, 2011 at 2:21 PM, Bill Graham <billgraham@gmail.com>
> >> >>> wrote:
> >> >>>> If you use some combination of delete requests and leave a
row
> >> >>>> without
> >> >>>> any column data will the row/rowkey still exist? I'm thinking
of
> the
> >> >>>> use case where you want to prune all old data, including row
keys,
> >> >>>> from a table.
> >> >>>>
> >> >>>>
> >> >>>> On Fri, Jan 21, 2011 at 2:04 PM, Ryan Rawson <ryanobjc@gmail.com>
> >> >>>> wrote:
> >> >>>>> There are 3 kinds of deletes (with a 4th for win):
> >> >>>>>
> >> >>>>> - Delete.deleteFamily(byte [] family, [long])
> >> >>>>> -- This removes all data from the given family before the
given
> >> >>>>> timestamp, or if none is given, System.currentTimeMillis()
> >> >>>>> - Delete.deleteColumns(byte[] family, byte[]qualifier,
[long])
> >> >>>>> -- This removes all data from the given qualifier, before
the
> given
> >> >>>>> timestamp, or if none is given, System.currentTimeMillis()
> >> >>>>> - Delete.deleteColumn(byte[]family, byte[]qualifier, [long])
> >> >>>>> -- This removes A SINGLE VERSION at the given time, or
if none is
> >> >>>>> given, the most recent version is Get'ed and deleted.
> >> >>>>> - Delete()
> >> >>>>> -- Calls deleteFamily() on server side on every family.
> >> >>>>>
> >> >>>>> Stack is talking about the LAST delete form.
> >> >>>>>
> >> >>>>> I think what you want is probably deleteColumns() (plural!),
or
> >> >>>>> perhaps deleteFamily().
> >> >>>>>
> >> >>>>> One rarely wants to call deleteColumn(), since it removes
just a
> >> >>>>> single version, thus exposing older versions, which MAY
be what
> you
> >> >>>>> want, but I'm guessing probably isn't.
> >> >>>>>
> >> >>>>> Only the last form (deleteColumn (singlar!)) calls Get,
the rest
> do
> >> >>>>> not call Get and are very fast.
> >> >>>>>
> >> >>>>> -ryan
> >> >>>>>
> >> >>>>> On Fri, Jan 21, 2011 at 1:51 PM, Stack <stack@duboce.net>
wrote:
> >> >>>>>> On Fri, Jan 21, 2011 at 12:30 PM, Matt Corgan <
> mcorgan@hotpads.com>
> >> >>>>>> wrote:
> >> >>>>>>> Is there a way to issue a delete using the server's
current
> >> >>>>>>> timestamp?  I
> >> >>>>>>> see methods using HConstants.LATEST_TIMESTAMP which
is extremely
> >> >>>>>>> expensive
> >> >>>>>>> since it triggers a Get call.
> >> >>>>>>
> >> >>>>>> Yes.  Deleting latest version involves a Get to figure
the most
> >> >>>>>> recents timestamp.  And yes, in src code it says this
is
> >> >>>>>> 'expensive'.
> >> >>>>>> Seems like it does this lookup anything LATEST_TIMESTAMP
is
> passed
> >> >>>>>> whether column, columns, or family only to ensure the
delete goes
> >> >>>>>> in
> >> >>>>>> ahead of whatever is currently in the Store.
> >> >>>>>>
> >> >>>>>> St.Ack
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message