hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: Scan vs Put vs Get
Date Thu, 28 Jun 2012 16:04:25 GMT
Oh! I see! KeyOnlyFilter is overwriting the RandomRowFilter! Bad. I
mean, bad I did not figured that. Thanks for pointing that. That
definitively explain the difference in the performances.

I have activated the bloomfilters with this code:
HBaseAdmin admin = new HBaseAdmin(config);
HTable table = new HTable(config, "test3");
System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);
HColumnDescriptor cd = table.getTableDescriptor().getColumnFamilies()[0];
cd.setBloomFilterType(BloomType.ROW);
admin.disableTable("test3");
admin.modifyColumn("test3", cd);
admin.enableTable("test3");
System.out.println (table.getTableDescriptor().getColumnFamilies()[0]);

And here is the result for the first attempt (using gets):
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
'true', BLOCKCACHE => 'true'}
{NAME => 'cf', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW',
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE',
MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS =>
'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK =>
'true', BLOCKCACHE => 'true'}
Thu Jun 28 11:08:59 EDT 2012 Processing iteration 0...
Time to read 1000 lines : 40177.0 mseconds (25 lines/seconds)

2nd: Time to read 1000 lines : 7621.0 mseconds (131 lines/seconds)
3rd: Time to read 1000 lines : 7659.0 mseconds (131 lines/seconds)
After few more iterations (about 30), I'm between 200 and 250
lines/seconds, like before.

Regarding the filterList, I tried, but now I'm getting this error from
the servers:
org.apache.hadoop.hbase.regionserver.LeaseException:
org.apache.hadoop.hbase.regionserver.LeaseException: lease
'-6376193724680783311' does not exist
Here is the code:
	final int linesToRead = 10000;
	System.out.println(new java.util.Date () + " Processing iteration " +
iteration + "... ");
	RandomRowFilter rrf = new RandomRowFilter();
	KeyOnlyFilter kof = new KeyOnlyFilter();
	Scan scan = new Scan();
	List<Filter> filters = new ArrayList<Filter>();
	filters.add(rrf);
	filters.add(kof);
	FilterList filterList = new FilterList(filters);
	scan.setFilter(filterList);
	scan.setBatch(Math.min(linesToRead, 1000));
	scan.setCaching(Math.min(linesToRead, 1000));
	ResultScanner scanner = table.getScanner(scan);
	processed = 0;
	long timeBefore = System.currentTimeMillis();
	for (Result result : scanner.next(linesToRead))
	{
		System.out.println("Result: " + result); //
		if (result != null)
			processed++;
	}
	scanner.close();

It's failing when I try to do for (Result result :
scanner.next(linesToRead)). I tried with linesToRead=1000, 100, 10 and
1 with the same result :(

I will try to find the root cause, but if you have any hint, it's welcome.

JM

Mime
View raw message