hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew McCall <andrew.mcc...@goroam.net>
Subject IndexedTable and Delete
Date Tue, 21 Jul 2009 20:04:38 GMT

I've been using the IndexedTable stuff from contrib and come across a  
bit of an issue.

When I delete a column my indexes are removed for that column. I've  
run through the code in IndexedRegion and used very similar code in my  
own classes to recreate the index after I've run the delete.

I've also noticed that if I run a Put after the Delete then the index  
will be re-created.

Neither the Delete or the subsequent Put in the second example uses  
any of the columns that are part of the index (either indexed or  
additional columns).

If I'm not mistaken the problem lies in the code to rebuild the index  
from org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegion:

   public void delete(Delete delete, final Integer lockid, boolean  
       throws IOException {

     if (!getIndexes().isEmpty()) {
       // Need all columns
       NavigableSet<byte[]> neededColumns =  

       Get get = new Get(delete.getRow());
       for (byte [] col : neededColumns) {

       Result oldRow = super.get(get, null);
       SortedMap<byte[], byte[]> oldColumnValues =  

       for (IndexSpecification indexSpec : getIndexes()) {
         removeOldIndexEntry(indexSpec, delete.getRow(),  

       // Handle if there is still a version visible.
       if (delete.getTimeStamp() != HConstants.LATEST_TIMESTAMP) {
         get.setTimeRange(1, delete.getTimeStamp());
         oldRow = super.get(get, null);
         SortedMap<byte[], byte[]> currentColumnValues =  
         LOG.debug("There are " + currentColumnValues + " entries to  

         for (IndexSpecification indexSpec : getIndexes()) {
           if (IndexMaintenanceUtils.doesApplyToIndex(indexSpec,  
currentColumnValues)) {
             updateIndex(indexSpec, delete.getRow(),  
     super.delete(delete, lockid, writeToWAL);

I'm not sure if I've got this right but it seems that any delete will  
remove the indexes, but they will only be rebuilt if the delete is of  
a previous version for the row, and then the index will then be built  
using data from the version prior to that which you've just deleted -  
which seems to mean it would, more often than not, always be out of  

More broadly it also occurs to me that it may make sense not to delete  
the indexes at all unless the Delete would otherwise affect them. In  
my case there isn't really any reason to remove the indexes, the  
column I'm deleting is completely unrelated.



View raw message