hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emre Colak <cole...@gmail.com>
Subject Re: Cells do not get cleared after TTL is set in HBase
Date Fri, 16 Oct 2015 16:07:51 GMT
Thanks for taking a look Anoop. I've just filed HBASE-14630.

On Fri, Oct 16, 2015 at 6:34 AM, Anoop John <anoop.hbase@gmail.com> wrote:

> I believe the issue with the order with the per cell TTL calc and avoid
> expired cells and versions control is the issue.    When the scan happens
> after the TTL time after second put,  there will be still 2 cells in the
> system.  The 2nd one will not come out as it is TTL expired.  But the 1st
> one as such is not expired..n  If the version check and select only latest
> one happens 1st, and the TTL check, u would have got the desired behavior.
>     Mind raising a jira.  We can discuss there how/whether to solve it.
>
> -Anoop-
>
> On Wed, Oct 14, 2015 at 9:43 AM, Emre Colak <colemre@gmail.com> wrote:
>
> > Yes, I'm trying to use the per cell TTL feature. I've tried releases
> 1.0.2
> > and 1.1.2.
> >
> > Here's some Scala code that I've written:
> > ===============================
> >
> > def makePut(rowKey: Array[Byte], cf: Array[Byte], qual: Array[Byte],
> value:
> > Array[Byte]): Put = {
> >     val put = new Put(rowKey)
> >     put.addColumn(cf, qual, value)
> >     put
> > }
> >
> > def getIndex(table: Table, indexName: Array[Byte], cfName: Array[Byte]):
> > Seq[(String, Array[Byte], Long)] = {
> >   val result = MutableList[(String, Array[Byte], Long])]()
> >
> >     val queryResult = table.get(new Get(indexName))
> >     val cellScanner: CellScanner = queryResult.cellScanner()
> >     while (cellScanner.advance()) {
> >     val cell = cellScanner.current()
> >
> >     if (CellUtil.matchingFamily(cell, cfName)) {
> >         val tuple = (Bytes.toStringBinary(cell.getQualifierArray,
> > cell.getQualifierOffset, cell.getQualifierLength),
> >                       Bytes.copy(cell.getValueArray, cell.getValueOffset,
> > cell.getValueLength),
> >                       cell.getTimestamp)
> >         result += tuple
> >       }
> >   }
> >
> >     result
> > }
> >
> > def printIndices(table: Table, indexName: Array[Byte], cfName:
> > Array[Byte]): Unit = {
> >   getIndex(table, indexName, cfName).foreach {
> >     case (q, v, ts) => {
> > println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
> >       }
> >     }
> > }
> >
> > // Establish connection
> >
> > println("Inserting indices into the database")
> > val table = connection.getTable(TableName.valueOf(tableName))
> > table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx1"),
> > Array[Byte](0,0,0,0,1)))
> > table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx2"),
> > Array[Byte](0,0,0,1,0)))
> > table.put(makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idx3"),
> > Array[Byte](0,0,1,0,0)))
> >
> > println("Indices in the database: ")
> > val putList = MutableList[Put]()
> > getIndex(table, rowKeyBytes, cfBytes).foreach {
> >   case (q, v, ts) => {
> > println("qualifier: %s, value: %s, ts: %d".format(q, v, ts))
> >
> >          val put = makePut(rowKeyBytes, cfBytes, Bytes.toBytes(q), v)
> >          put.setTTL(30000) // 30 second TTL
> >          putList += put
> >     }
> >     putList += makePut(rowKeyBytes, cfBytes, Bytes.toBytes("idxMerged"),
> > Array[Byte](0,0,1,1,1))
> > }
> >
> > println("Merging existing cells and setting TTLs")
> > table.put(putList)
> >
> > println("Table contents right after the merge: ")
> > printIndices(table, rowKeyBytes, cfBytes)
> >
> > Thread.sleep(10000)
> >
> > println("Table contents 10 seconds after the merge: ")
> > printIndices(table, rowKeyBytes, cfBytes)
> >
> > Thread.sleep(30000)
> >
> > println("Table contents 40 seconds after the merge: ")
> > printIndices(table, rowKeyBytes, cfBytes)
> >
> > // close table and connection
> >
> > And here's what it prints out:
> > =========================
> >
> > Inserting indices into the database
> > Indices in the database:
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952218
> > Merging existing cells and setting TTLs
> > Table contents right after the merge:
> > key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> > Table contents 10 seconds after the merge:
> > key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952341
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952341
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952341
> > Table contents 40 seconds after the merge:
> > key: idxMerged, value: 0,0,1,1,1, ts: 1444791952341
> > key: idx1, value: 0,0,0,0,1, ts: 1444791952201
> > key: idx2, value: 0,0,0,1,0, ts: 1444791952214
> > key: idx3, value: 0,0,1,0,0, ts: 1444791952218
> >
> >
> > On Tue, Oct 13, 2015 at 8:25 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Looks like you are using per cell TTL feature.
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you formulate your description with either sequence of shell
> commands
> > > or a unit test ?
> > >
> > > Thanks
> > >
> > > On Tue, Oct 13, 2015 at 8:13 PM, Colak, Emre <
> emre.colak@bina.roche.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I have an HBase table with the following description:
> > > >
> > > > {NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER =>
'ROW',
> > > > REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE',
> > > > MIN_VERSIONS => '0' , TTL => 'FOREVER', KEEP_DELETED_CELLS =>
> 'FALSE',
> > > > BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}
> > > >
> > > > I put some values in it and then set TTL (30s) on those values with
> > > another
> > > > put operation. First thing I notice is that the timestamps of the
> cells
> > > get
> > > > updated after the 2nd put. And 30 seconds later, when I do a scan on
> > the
> > > > table, I still see those cells in the table, however this time with
> > their
> > > > timestamps updated to the original timestamps.
> > > >
> > > > I understand that these cells won't necessarily be deleted until a
> > > > compaction, but why do they still come up in my scan even though the
> > TTL
> > > > that I set on them has expired?
> > > >
> > > > Best,
> > > >
> > > > Emre
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message