hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Scan vs Put vs Get
Date Wed, 27 Jun 2012 23:34:22 GMT
Hi,

I have a small piece of code, for testing, which is putting 1B lines
in an existing table, getting 3000 lines and scanning 10000.

The table is one family, one column.

Everything is done randomly. Put with Random key (24 bytes), fixed
family and fixed column names with random content (24 bytes).

Get (batch) is done with random keys and scan with RandomRowFilter.

And here are the results.
Time to insert 1000000 lines: 43 seconds (23255 lines/seconds)
That's correct for my needs based on the poor performances of the
servers in the cluster. I'm fine with the results.

Time to read 3000 lines: 11444.0 mseconds (262 lines/seconds)
This is way to low. I don't understand why. So I tried the random scan
because I'm not able to figure the issue.

Time to read 10000 lines: 108.0 mseconds (92593 lines/seconds)
This it impressive! I have added that after I failed with the get. I
moved from 262 lines per seconds to almost 100K lines/seconds!!! It's
awesome!

However, I'm still wondering what's wrong with my gets.

The code is very simple. I'm using Get objects that I'm executing in a
Batch. I tried to add a filter but it's not helping. Here is an
extract of the code.

	    		for (long l = 0; l < linesToRead; l++)
	        	{
	        		byte[] array1 = new byte[24];
	        		for (int i = 0; i < array1.length; i++)
						array1[i] = (byte)Math.floor(Math.random() * 256);
	        		Get g = new Get (array1);
	    			gets.addElement(g);    			
	        	}
				Object[] results = new Object[gets.size()];
				System.out.println(new java.util.Date () + " \"gets\" created.");
				long timeBefore = System.currentTimeMillis();
	    		table.batch(gets, results);
	    		long timeAfter = System.currentTimeMillis();
	
	    		float duration = timeAfter - timeBefore;
	    		System.out.println ("Time to read " + gets.size() + " lines : "
+ duration + " mseconds (" + Math.round(((float)linesToRead /
(duration / 1000))) + " lines/seconds)");

What's wrong with it? I can't add the setBatch neither I can add
setCaching because it's not a scan. I tried with different numbers of
gets but it's almost always the same speed. Am I using it the wrong
way? Does anyone have any advice to improve that?

Thanks,

JM

Mime
View raw message