hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: Cells do not get cleared after TTL is set in HBase
Date Fri, 16 Oct 2015 13:34:55 GMT
I believe the issue with the order with the per cell TTL calc and avoid
expired cells and versions control is the issue.    When the scan happens
after the TTL time after second put,  there will be still 2 cells in the
system.  The 2nd one will not come out as it is TTL expired.  But the 1st
one as such is not expired..n  If the version check and select only latest
one happens 1st, and the TTL check, u would have got the desired behavior.
    Mind raising a jira.  We can discuss there how/whether to solve it.

-Anoop-

On Wed, Oct 14, 2015 at 9:43 AM, Emre Colak <colemre@gmail.com> wrote:

> Yes, I'm trying to use the per cell TTL feature. I've tried releases 1.0.2
> and 1.1.2.
>
> Here's some Scala code that I've written:
> ===============================
>
> def makePut(rowKey: Array[Byte], cf: Array[Byte], qual: Array[Byte], value:
> Array[Byte]): Put = {
>     val put = new Put(rowKey)
>     put.addColumn(cf, qual, value)
>     put
> }
>
> def getIndex(table: Table, indexName: Array[Byte], cfName: Array[Byte]):
> Seq[(String, Array[Byte], Long)] = {
>   val result = MutableList[(String, Array[Byte], Long])]()
>
>     val queryResult = table.get(new Get(indexName))
>     val cellScanner: CellScanner = queryResult.cellScanner()
>     while (cellScanner.advance()) {
>     val cell = cellScanner.current()
>
>     if (CellUtil.matchingFamily(cell, cfName)) {
>         val tuple = (Bytes.toStringBinary(cell.getQualifierArray,
> cell.getQualifierOffset, cell.getQualifierLength),
>                       Bytes.copy(cell.getValueArray, cell.getValueOffset,
> cell.getValueLength),
>                       cell.getTimestamp)
>         result += tuple
>       }
>   }
>
>     result
> }
>
> def printIndices(table: Table, indexName: Array[Byte], cfName:
> Array[Byte]): Unit = {
>   getIndex(table, indexName, cfName).foreach {
>     case (q, v, ts) => {
> println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
>       }
>     }
> }
>
> // Establish connection
>
> println("Inserting indices into the database")
> val table = connection.getTable(TableName.valueOf(tableName))
> table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx1"),
> Array[Byte](0,0,0,0,1)))
> table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx2"),
> Array[Byte](0,0,0,1,0)))
> table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx3"),
> Array[Byte](0,0,1,0,0)))
>
> println("Indices in the database: ")
> val putList = MutableList[Put]()
> getIndex(table, rowKeyBytes, cfBytes).foreach {
>   case (q, v, ts) => {
> println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
>
>          val put = makePut(rowKeyBytes, cfBytes, Bytes.toBytes(q), v)
>          put.setTTL(30000) // 30 second TTL
>          putList += put
>     }
>     putList += makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idxMerged"),
> Array[Byte](0,0,1,1,1))
> }
>
> println("Merging existing cells and setting TTLs")
> table.put(putList)
>
> println("Table contents right after the merge: ")
> printIndices(table, rowKeyBytes, cfBytes)
>
> Thread.sleep(10000)
>
> println("Table contents 10 seconds after the merge: ")
> printIndices(table, rowKeyBytes, cfBytes)
>
> Thread.sleep(30000)
>
> println("Table contents 40 seconds after the merge: ")
> printIndices(table, rowKeyBytes, cfBytes)
>
> // close table and connection
>
> And here's what it prints out:
> =========================
>
> Inserting indices into the database
> Indices in the database:
> key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> key: idx3, value: 0,0,1,0,0, ts: 1444791952218
> Merging existing cells and setting TTLs
> Table contents right after the merge:
> key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> Table contents 10 seconds after the merge:
> key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> Table contents 40 seconds after the merge:
> key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> key: idx3, value: 0,0,1,0,0, ts: 1444791952218
>
>
> On Tue, Oct 13, 2015 at 8:25 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Looks like you are using per cell TTL feature.
> >
> > Which hbase release are you using ?
> >
> > Can you formulate your description with either sequence of shell commands
> > or a unit test ?
> >
> > Thanks
> >
> > On Tue, Oct 13, 2015 at 8:13 PM, Colak, Emre <emre.colak@bina.roche.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have an HBase table with the following description:
> > >
> > > {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
> > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
> > > MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE',
> > > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> > >
> > > I put some values in it and then set TTL (30s) on those values with
> > another
> > > put operation. First thing I notice is that the timestamps of the cells
> > get
> > > updated after the 2nd put. And 30 seconds later, when I do a scan on
> the
> > > table, I still see those cells in the table, however this time with
> their
> > > timestamps updated to the original timestamps.
> > >
> > > I understand that these cells won't necessarily be deleted until a
> > > compaction, but why do they still come up in my scan even though the
> TTL
> > > that I set on them has expired?
> > >
> > > Best,
> > >
> > > Emre
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message