hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steinmaurer Thomas" <Thomas.Steinmau...@scch.at>
Subject RE: Something like Execution Plan as in the RDBMS world?
Date Thu, 04 Aug 2011 10:57:12 GMT
Hi Andy and Ted!

Thanks for your reply. Basically, I'm currently trying a range scan and a regex row filter
on a very small table (~ 115K rows), just to get used to. Hadoop/HBase ... is running in the
available Cloudera VM.

I have the following row key, as already discussed in other threads.

vehicle_id: up to 16 characters
device_id: up to 16 characters
timestamp: YYYYMMDDhhmmss

Pretty much one row every 5 minutes for a particular vehicle and device.

Now I want to get the rows for an entire day for a particular vehicle and device.

The following range scan implementation:

	Scan scan = new Scan();

	String startKey =
		String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' ', '0') // Vehicle ID
		+ "-"
		+ String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "1").replace(' ', '0') // Device ID
		+ "-"
		+ "20110808000000";
	String endKey =
		String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' ', '0') // Vehicle ID
		+ "-"
		+ String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "1").replace(' ', '0') // Device ID
		+ "-"
		+ "20110808235959";
	scan.setStartRow(Bytes.toBytes(startKey));
	scan.setStopRow(Bytes.toBytes(endKey));
	scan.addColumn(Bytes.toBytes("data_details"), Bytes.toBytes("temperature1_value"));

Takes < 1 sec.

Whereas the following regex based row filter implementation:

	List<Filter> filters = new ArrayList<Filter>();
	RowFilter rf = new RowFilter(
		CompareFilter.CompareOp.EQUAL
		, new RegexStringComparator(".{14}57\\-.{15}1\\-20110808.{6}")
	);
	filters.add(rf);
	
	QualifierFilter qf = new QualifierFilter(
		CompareFilter.CompareOp.EQUAL
		, new RegexStringComparator("temperature1_value")
	);
	filters.add(qf);
	
	FilterList filterList1 = new FilterList(filters);
	scan.setFilter(filterList1);


Takes around 6 sec on a very small table.


We aren't sure if we need the regex row filter capabilities at all or if range scans are sufficient
for our access pattern. But a better understanding on how to optimize regex stuff would be
helpful.


Thanks!

Thomas


-----Original Message-----
From: Andrew Purtell [mailto:apurtell@apache.org] 
Sent: Mittwoch, 27. Juli 2011 08:25
To: user@hbase.apache.org
Subject: Re: Something like Execution Plan as in the RDBMS world?

> Or is this a complete different thinking?

Yes.

There isn't an "execution plan" when using HBase, as that term is commonly understood from
RDBMS systems. The commands you issue against HBase using the client API are executed in order
as you issue them.

> Depending on the access pattern, we might be in a situation to use 
>e.g. RegEx filters on rowkeys. I wonder if there is some kind of an 
>execution plan when running a HBase query to better understand

Exposing filter statistics (hit/skip ratio etc.) and other per-query metrics like number of
store files read, how many keys examined, etc. is an interesting idea perhaps along the lines
of what you ask, but HBase does not have support for that level of query performance introspection
at the moment. 

What people do is measure the application metrics of interest and try different approaches
to optimize them.

Best regards,


   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


>________________________________
>From: Steinmaurer Thomas <Thomas.Steinmaurer@scch.at>
>To: user@hbase.apache.org
>Sent: Tuesday, July 26, 2011 11:10 PM
>Subject: Something like Execution Plan as in the RDBMS world?
>
>Hello,
>
>
>
>we have a three part row-key taking into account that the first part is 
>important for distribution/partitioning when the system grows. 
>Depending on the access pattern, we might be in a situation to use e.g. 
>RegEx filters on rowkeys. I wonder if there is some kind of an 
>execution plan (as known in RDBMS) when running a HBase query to better 
>understand how HBase processes the query and what execution path it 
>takes to generate the result set.
>
>
>
>Or is this a complete different thinking?
>
>
>
>Thanks,
>
>Thomas
>
>
>
>
>
>

Mime
View raw message