hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1002) Small query language for filters
Date Mon, 17 Nov 2008 19:27:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12648269#action_12648269

Andrew Purtell commented on HBASE-1002:

Here is a conversation about this on IRC:

(10:41:16 AM) ffgeek200: are there any thoughts regarding a future query languages for row
(10:42:43 AM) st^Ack_: ffgeek200: how do you mean?
(10:44:06 AM) ffgeek200: st^Ack_: ie give me all rows where "(int(col("entry:price")) >
3 && col("entry:name")=="ABC" || col("entry:name")="XYZ"
(10:45:28 AM) apurtell: ffgeek200: filter spec -> little language compiler -> specialized
bytecode -> execution on regionserver during scanner traversal ?
(10:46:11 AM) ffgeek200: apurtell: yes
(10:46:56 AM) apurtell: ffgeek200: what about filter spec -> little language compiler ->
code to build existing (maybe modified a little) filter class heirarchies -> send to regionserver
in the current manner?
(10:49:30 AM) ffgeek200: apurtell: it could be implemented in many ways yes. That is another
way. What about something crazy like writing java code that implements a RowFilterInterface
method "boolean isFiltered(Row row)", then serialize that class over the network... let Java
deal with compilation since it does that well.
(10:50:01 AM) st^Ack_: or apurtell, how about a jruby filter? You pass it jruby code, and
it runs it on every row?
(10:50:51 AM) ffgeek200: jruby would work. I remember reading about a similar database and
they used server-side javascript for this purpose.
(10:52:16 AM) apurtell: stack,ffgeek200: jruby snippit is good. was going to reply that java
serialization only works if the classes are available at each endpoint (java serialization
does not ship code afaik).
(10:54:37 AM) ffgeek200: apurtell: true. I think that would be cleaner than how it is currently
done, trying to munge your row filter to do what you want.
(10:55:12 AM) st^Ack_: apurtell: yes that the classes would have to be on CLASSPATH on either
end of the serialization. jruby script would be better (this jruby suggestion is just your
filter spec -> little language compiler -> etc. suggestion generalized)
(10:56:09 AM) apurtell: ffgeek200,stack: downside to jruby snippet is it is an untrusted code
upload to regionserver. that's why i suggested using existing classes, which cause only restricted/controlled
actions to happen in the regionserver. on the other hand jruby snippets can be managed when
access control is added in a manner similar to how rdbms controls stored procedures.
(10:56:52 AM) st^Ack_: apurtell: you are right
(10:57:08 AM) st^Ack_: very hard preventing jruby snippet running riot
(10:57:15 AM) apurtell: stack: indeed
(10:58:13 AM) ffgeek200: apurtell: postgres allows for sprocs to be in pretty much all popular
languages, but I'm not sure if it restricts the sprocs.
(11:00:53 AM) ffgeek200: example of how they do it with ruby: http://www.april-child.com/blog/2007/05/10/running-ruby-in-postgresql-on-mac-os-x/
(11:01:42 AM) apurtell: ffeek200: stored procedure access control is rwx by user plus setuid
typically, to use a fs metaphor.
(11:02:02 AM) apurtell: ffgeek200: at least that was what i was referring to.
(11:03:11 AM) ffgeek200: apurtell: I think it would definitely open a can of worms security-wise.
For me it's fine since I'm in control of everything over here, but others may want restrictions
on its usage, maybe they would choose to not compile it in.
(11:04:25 AM) ffgeek200: no matter what security restrictions you impose, they can of course
always sit in a while loop and burn CPU.
(11:05:33 AM) jgray: ffgeek200: postgres has safe and unsafe integration with other languages
for stored procedures
(11:05:43 AM) apurtell: it does seem to me that a little language compiler that builds hierarchies
of filters in the current form is a desirable feature. can be some kind of contrib. common
query operators can be supported, and the class implementation server side maintains safety.
and anything the "compiler" might do can be constructed by hand as desired (no api changes).
(11:10:34 AM) ffgeek200: jgray: thanks I forgot about that. apurtell: sounds awesome. I'm
biased re: postgres since I think it does a good job of this. What if that little language
compiler was done for now, calling it something like hbaseql then later on other languages
could be implemented, but the default one is hbaseql.

> Small query language for filters
> --------------------------------
>                 Key: HBASE-1002
>                 URL: https://issues.apache.org/jira/browse/HBASE-1002
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: filters
>            Reporter: Andrew Purtell
>            Priority: Minor
> Improve the usability of filters by making them specifiable or executable using a little
query language. 
> For example:
>     col("entry:price") > 3 && (col("entry:name") = "ABC" || col("entry:name")
= "XYZ")
> Can be implemented as a little language compiler that takes filter specifications as
input and builds the requisite hierarchy of filter API classes and actions as emitted java
> Can also be implemented using JRuby snippets sent to the regionserver for execution,
but this has troublesome security implications.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message