hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: delete rows from hbase
Date Wed, 20 Jun 2012 14:10:40 GMT
Hi,

Ok...

Just a couple of nits...

1) Please don't write your Mapper and Reducer classes as inner classes. 
I don't know who started this ... maybe its easier as example code. But It really makes it
harder to learn M/R code.  (Also harder to teach, but that's another story... ;-)

2) Looking at your code I saw this...
> public static class MyMapper extends
> TableMapper<ImmutableBytesWritable, Delete> {
and
> context.write(row, new Delete(row.get()));

Ok... while this code works, I have to ask why?

Wouldn't it be simpler to do the following.... [Note this code is an example... written from
memory...]

Add a class variable HTable delTab...

Inside MyMapper add the following:

@Override setup(Mapper.Context context)
{
	delTab = new HTable(context.getConfiguration(), "DELETE TABLE NAME GOES HERE");
}

Then in your TableMapper.map() 

> @Override
>        public void map(ImmutableBytesWritable row, Result value, Context
> context) throws IOException, InterruptedException {
>            context.getCounter("amobee",
> "DeleteRowByCriteria.RowCounter").increment(1);
>            delTab.delete(new Delete(row);  <=== This row changed to use the reference
to the table where we want to delete rows.
>        }

Not much difference except that you're not using the context. 
You can test the solution. 

Its a bit more general because you could be selecting rows from one table and using that data
deleting from another. 

In terms of speed. Its relative. 

If you want to batch the rows, you could. Then you'd want to put in a local counter and every
100 rows pass in a batch delete. 

While I suspect there isn't much difference in using the Context.write and just issuing a
HTable.delete(),  it makes it more generic such that you can use the same code to delete from
a single table or different tables. 


HTH

-Mike

On Jun 20, 2012, at 6:56 AM, Oleg Ruchovets wrote:

> *
> *
> 
> Well  , I a bit changed my previous solution , it works but it is very slow
> !!!!!!!
> 
> I think it is because I pass SINGLE DELETE object  and not LIST of DELETES.
> 
> Is it possible to pass List of Deletes thru map instead of single delete?
> 
> import org.apache.commons.cli.*;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.client.Scan;
> import org.apache.hadoop.hbase.filter.Filter;
> import org.apache.hadoop.hbase.filter.PrefixFilter;
> import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
> import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
> import org.apache.hadoop.hbase.mapreduce.TableMapper;
> import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
> import org.apache.hadoop.hbase.util.Bytes;
> import org.apache.hadoop.mapreduce.Job;
> import org.slf4j.Logger;
> import org.slf4j.LoggerFactory;
> 
> import java.io.IOException;
> 
> public class DeleteRowByCriteria {
>    final static Logger LOG =
> LoggerFactory.getLogger(DeleteRowByCriteria.class);
> 
>    public static class MyMapper extends
> TableMapper<ImmutableBytesWritable, Delete> {
> 
>        @Override
>        public void map(ImmutableBytesWritable row, Result value, Context
> context) throws IOException, InterruptedException {
>            context.getCounter("amobee",
> "DeleteRowByCriteria.RowCounter").increment(1);
>            context.write(row, new Delete(row.get()));
>        }
>    }
> 
> 
>    public static void main(String[] args) throws ClassNotFoundException,
> IOException, InterruptedException {
> 
>        Configuration config = HBaseConfiguration.create();
>        config.setBoolean("mapred.map.tasks.speculative.execution" , false);
>        Job job = new Job(config, "DeleteRowByCriteria");
>        job.setJarByClass(DeleteRowByCriteria.class);
> 
> 
>        Options options = getOptions();
>        try {
>            AggregationContext aggregationContext =
> getAggregationContext(args, options);
>            Filter campaignIdFilter = new
> PrefixFilter(Bytes.toBytes(aggregationContext.getCampaignId()));
>            Scan scan = new Scan();
>            scan.setFilter(campaignIdFilter);
>            scan.setCaching(20000);
>            scan.setCacheBlocks(false);
> 
> 
>            TableMapReduceUtil.initTableMapperJob(
>                    aggregationContext.getCmltTableName(),
>                    scan,
>                    MyMapper.class,
>                    null,
>                    null,
>                    job);
> 
>            job.setOutputFormatClass(TableOutputFormat.class);
>    job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE,
> aggregationContext.getCmltTableName());
> 
>            job.setNumReduceTasks(0);
> 
>            boolean b = job.waitForCompletion(true);
>            if (!b) {
>                throw new IOException("error with job!");
>            }
> 
>        } catch (Exception e) {
>            LOG.error(e.getMessage(), e);
>        }
> 
> 
>    }
> 
> }
> 
> 
> On Wed, Jun 20, 2012 at 7:41 AM, Michael Segel <michael_segel@hotmail.com>wrote:
> 
>> Hi,
>> 
>> The simple way to do this as a map/reduce is the following....
>> 
>> Use the HTable Input and scan the records you want to delete.
>> In side Mapper.Setup() create a connection to the HTable where you want to
>> delete the records.
>> In side Mapper.Map() for each iteration you will get a row which matched
>> your scan that you set up in your ToolRunner.  If the record matches the
>> criteria that you want to delete, you just issue a delete command passing
>> in that row key.
>> 
>> And voila! You are done.
>> 
>> No muss, no fuss, and no reducer.
>> 
>> Its that easy.
>> 
>> There is no output that you return to your client job except if you maybe
>> want to keep count of the records that you deleted and that's an easy thing
>> to do using dynamic counters.
>> 
>> HTH
>> -Mike
>> 
>> On Jun 20, 2012, at 3:38 AM, Anoop Sam John wrote:
>> 
>>> Hi
>>>     Do some one tried for the possibility of an Endpoint implementation
>> using which the delete can be done directly with the scan at server side.
>>> In the below samples I can see
>>> Client -> Server - Scan for certain rows ( we want the rowkeys
>> satisfying our criteria)
>>> Client <- Server - returns the Results
>>> Client -> Server - Delete calls
>>> 
>>> Instead using the Endpoints we can make one call from Client to Server
>> in which both the scan and the delete will happen...
>>> 
>>> -Anoop-
>>> ________________________________________
>>> From: Oleg Ruchovets [oruchovets@gmail.com]
>>> Sent: Tuesday, June 19, 2012 9:47 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: delete rows from hbase
>>> 
>>> Thank you all for the answers. I try to speed up my solution and user
>>> map/reduce over hbase
>>> 
>>> Here is the code:
>>> I want to use Delete (map function to delete the row) and I pass the same
>>> tableName  at TableMapReduceUtil.initTableMapperJob
>>> and TableMapReduceUtil.initTableReducerJob.
>>> 
>>> Question: is it possible to pass Delete as I did in map function?
>>> 
>>> 
>>> 
>>> 
>>> public class DeleteRowByCriteria {
>>>   final static Logger LOG =
>>> LoggerFactory.getLogger(DeleteRowByCriteria.class);
>>>   public static class MyMapper extends
>>> TableMapper<ImmutableBytesWritable, Delete> {
>>> 
>>>       public String account;
>>>       public String lifeDate;
>>> 
>>>       @Override
>>>       public void map(ImmutableBytesWritable row, Result value, Context
>>> context) throws IOException, InterruptedException {
>>>           context.write(row, new Delete(row.get()));
>>>       }
>>>   }
>>>   public static void main(String[] args) throws ClassNotFoundException,
>>> IOException, InterruptedException {
>>> 
>>> String tableName = args[0];
>>> String filterCriteria = args[1];
>>> 
>>>       Configuration config = HBaseConfiguration.create();
>>>       Job job = new Job(config, "DeleteRowByCriteria");
>>>       job.setJarByClass(DeleteRowByCriteria.class);
>>> 
>>>       try {
>>> 
>>>           Filter campaignIdFilter = new
>>> PrefixFilter(Bytes.toBytes(filterCriteria));
>>>           Scan scan = new Scan();
>>>           scan.setFilter(campaignIdFilter);
>>>           scan.setCaching(500);
>>>           scan.setCacheBlocks(false);
>>> 
>>> 
>>>           TableMapReduceUtil.initTableMapperJob(
>>>                   tableName,
>>>                   scan,
>>>                   MyMapper.class,
>>>                   null,
>>>                   null,
>>>                   job);
>>> 
>>> 
>>>           TableMapReduceUtil.initTableReducerJob(
>>>                   tableName,
>>>                   null,
>>>                   job);
>>>           job.setNumReduceTasks(0);
>>> 
>>>           boolean b = job.waitForCompletion(true);
>>>           if (!b) {
>>>               throw new IOException("error with job!");
>>>           }
>>> 
>>>       }catch (Exception e) {
>>>           LOG.error(e.getMessage(), e);
>>>       }
>>>   }
>>> }
>>> 
>>> 
>>> 
>>> On Tue, Jun 19, 2012 at 9:26 AM, Kevin O'dell <kevin.odell@cloudera.com
>>> wrote:
>>> 
>>>> Oleg,
>>>> 
>>>> Here is some code that we used for deleting all rows with user name
>>>> foo.  It should be fairly portable to your situation:
>>>> 
>>>> import java.io.IOException;
>>>> 
>>>> import org.apache.hadoop.conf.Configuration;
>>>> import org.apache.hadoop.hbase.HBaseConfiguration;
>>>> import org.apache.hadoop.hbase.client.HTable;
>>>> import org.apache.hadoop.hbase.client.Result;
>>>> import org.apache.hadoop.hbase.client.ResultScanner;
>>>> import org.apache.hadoop.hbase.client.Scan;
>>>> import org.apache.hadoop.hbase.util.Bytes;
>>>> 
>>>> public class HBaseDelete {
>>>> public static void main(String[] args){
>>>> Configuration conf = HbaseConfiguration.create();
>>>> Htable t = new HTable("t");
>>>> 
>>>> String user = "foo";
>>>> 
>>>> byte[] startRow = Bytes.toBytes(user);
>>>> byte[] stopRow = Bytes.toBytes(user);
>>>> stopRow[stopRow.length - 1]++; //'fop'
>>>> Scan scan = new Scan(start Row, stopRow);
>>>> ResultScanner sc = t.getScanner(scan);
>>>> for(Result r : sc) {
>>>> t.delete(new Delete(r.getRow()));
>>>> }
>>>> }
>>>> }
>>>> /**
>>>> * Start row: foo
>>>> * HBase begins matching this byte, one after another.
>>>> * End row: foo
>>>> * HBase stops matching at first match, cause start == stop.
>>>> * End Row: fo[p] (p being 0 +1)
>>>> * HBase stops matching at something not "foo"
>>>> */
>>>> 
>>>> 
>>>> On Tue, Jun 19, 2012 at 6:46 AM, Mohammad Tariq <dontariq@gmail.com>
>>>> wrote:
>>>>> you can use Hbase RowFilter to do that.
>>>>> 
>>>>> Regards,
>>>>>   Mohammad Tariq
>>>>> 
>>>>> 
>>>>> On Tue, Jun 19, 2012 at 1:13 PM, shashwat shriparv
>>>>> <dwivedishashwat@gmail.com> wrote:
>>>>>> Try to impliment something like this
>>>>>> 
>>>>>> Class RegexStringComparator
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jun 19, 2012 at 5:06 AM, Amitanand Aiyer <amitanand.s@fb.com>
>>>> wrote:
>>>>>> 
>>>>>>> You could set up a scan with the criteria you want (start row,
end
>> row,
>>>>>>> keyonlyfilter etc), and do a delete for
>>>>>>> The rows you get.
>>>>>>> 
>>>>>>> On 6/18/12 3:08 PM, "Oleg Ruchovets" <oruchovets@gmail.com>
wrote:
>>>>>>> 
>>>>>>>> Hi ,
>>>>>>>> I need to delete rows from hbase table by criteria.
>>>>>>>> For example I need to delete all rows started with "12345".
>>>>>>>> I didn't find a way to set a row prefix for delete operation.
>>>>>>>> What is the best way ( practice ) to delete  rows by criteria
from
>>>> hbase
>>>>>>>> table?
>>>>>>>> 
>>>>>>>> Thanks in advance.
>>>>>>>> Oleg.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> ∞
>>>>>> Shashwat Shriparv
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Kevin O'Dell
>>>> Customer Operations Engineer, Cloudera
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message