hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Something like Execution Plan as in the RDBMS world?
Date Thu, 04 Aug 2011 22:05:14 GMT

Tomas,

If I understand you correctly you have a row key of A,B,C and you wan to fetch only the rows
on A and C 
You can do a start row of A 
And then do the end row of A1

So that you get the first row for the give vehicle_id, and then stop when the vehicle_id changes.

You would then have to do a server side filter on values for C to get the timestamp for a
given day.
(You could do this with a client side filter, but that means pushing all the data over the
wire.) 
[Note having said that, you could just do a client side filter since you only have 115K rows
and you're going to get a subset of that returned by the range key.]

The idea of doing something like the following:
SELECT * 
FROM TABLE 
WHERE A=x
AND DAY(C) = y [or some variation]
{A and C are part of a composite index}

doesn't work in HBase.

If your key was ACB, meaning that Vehicle_id, timestamp, device_id  was the composite key,
then you could do a start/stop range scan using A and C.

Sorry if I'm missing something since I jumped in the middle of a discussion.

-Mike


> Subject: RE: Something like Execution Plan as in the RDBMS world?
> Date: Thu, 4 Aug 2011 12:57:12 +0200
> From: Thomas.Steinmaurer@scch.at
> To: user@hbase.apache.org; apurtell@apache.org
> 
> Hi Andy and Ted!
> 
> Thanks for your reply. Basically, I'm currently trying a range scan and a regex row filter
on a very small table (~ 115K rows), just to get used to. Hadoop/HBase ... is running in the
available Cloudera VM.
> 
> I have the following row key, as already discussed in other threads.
> 
> vehicle_id: up to 16 characters
> device_id: up to 16 characters
> timestamp: YYYYMMDDhhmmss
> 
> Pretty much one row every 5 minutes for a particular vehicle and device.
> 
> Now I want to get the rows for an entire day for a particular vehicle and device.
> 
> The following range scan implementation:
> 
> 	Scan scan = new Scan();
> 
> 	String startKey =
> 		String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' ', '0') // Vehicle
ID
> 		+ "-"
> 		+ String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "1").replace(' ', '0') // Device
ID
> 		+ "-"
> 		+ "20110808000000";
> 	String endKey =
> 		String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "57").replace(' ', '0') // Vehicle
ID
> 		+ "-"
> 		+ String.format(HBASE_ROWKEY_DATASOURCEID_FORMAT, "1").replace(' ', '0') // Device
ID
> 		+ "-"
> 		+ "20110808235959";
> 	scan.setStartRow(Bytes.toBytes(startKey));
> 	scan.setStopRow(Bytes.toBytes(endKey));
> 	scan.addColumn(Bytes.toBytes("data_details"), Bytes.toBytes("temperature1_value"));
> 
> Takes < 1 sec.
> 
> Whereas the following regex based row filter implementation:
> 
> 	List<Filter> filters = new ArrayList<Filter>();
> 	RowFilter rf = new RowFilter(
> 		CompareFilter.CompareOp.EQUAL
> 		, new RegexStringComparator(".{14}57\\-.{15}1\\-20110808.{6}")
> 	);
> 	filters.add(rf);
> 	
> 	QualifierFilter qf = new QualifierFilter(
> 		CompareFilter.CompareOp.EQUAL
> 		, new RegexStringComparator("temperature1_value")
> 	);
> 	filters.add(qf);
> 	
> 	FilterList filterList1 = new FilterList(filters);
> 	scan.setFilter(filterList1);
> 
> 
> Takes around 6 sec on a very small table.
> 
> 
> We aren't sure if we need the regex row filter capabilities at all or if range scans
are sufficient for our access pattern. But a better understanding on how to optimize regex
stuff would be helpful.
> 
> 
> Thanks!
> 
> Thomas
> 
> 
> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org] 
> Sent: Mittwoch, 27. Juli 2011 08:25
> To: user@hbase.apache.org
> Subject: Re: Something like Execution Plan as in the RDBMS world?
> 
> > Or is this a complete different thinking?
> 
> Yes.
> 
> There isn't an "execution plan" when using HBase, as that term is commonly understood
from RDBMS systems. The commands you issue against HBase using the client API are executed
in order as you issue them.
> 
> > Depending on the access pattern, we might be in a situation to use 
> >e.g. RegEx filters on rowkeys. I wonder if there is some kind of an 
> >execution plan when running a HBase query to better understand
> 
> Exposing filter statistics (hit/skip ratio etc.) and other per-query metrics like number
of store files read, how many keys examined, etc. is an interesting idea perhaps along the
lines of what you ask, but HBase does not have support for that level of query performance
introspection at the moment. 
> 
> What people do is measure the application metrics of interest and try different approaches
to optimize them.
> 
> Best regards,
> 
> 
>    - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)
> 
> 
> >________________________________
> >From: Steinmaurer Thomas <Thomas.Steinmaurer@scch.at>
> >To: user@hbase.apache.org
> >Sent: Tuesday, July 26, 2011 11:10 PM
> >Subject: Something like Execution Plan as in the RDBMS world?
> >
> >Hello,
> >
> >
> >
> >we have a three part row-key taking into account that the first part is 
> >important for distribution/partitioning when the system grows. 
> >Depending on the access pattern, we might be in a situation to use e.g. 
> >RegEx filters on rowkeys. I wonder if there is some kind of an 
> >execution plan (as known in RDBMS) when running a HBase query to better 
> >understand how HBase processes the query and what execution path it 
> >takes to generate the result set.
> >
> >
> >
> >Or is this a complete different thinking?
> >
> >
> >
> >Thanks,
> >
> >Thomas
> >
> >
> >
> >
> >
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message