hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brian Beggs (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1064) HBase REST xml/json improvements
Date Mon, 22 Dec 2008 15:38:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658571#action_12658571

Brian Beggs commented on HBASE-1064:

Brian: I don't exactly following the below:

.bq Also the reason for the change in moving to the query string for some of these items is
that in order to retrieve the row/column/timestamp using the path you are unable to have any
directives in the path. Unless we wanted to get into the thought of reserved words, which
IMHO is a bad idea and complicates the interface.

So with this new implementation of the REST interface it's possible to query a table, row,
column, or timestamp directly using the path that follows the url.

For example:
Would retrieve the second row from testtable1.

Would retrieve the column rowWithData:otherData from the secondrow from testtable1.

same thing works for timestamps...
Would retrieve the cell at timestamp 1229121022233, from cell rowWithData:otherData, in row
thesecondrow, from table testtable1

.bq Now I think the real question that needs to be answered... is it necessary or desirable
to query out the row/column/timestamp data in this RESTful fashion using the path?

So my question is... Is it desirable to have the interface work in such a way that you are
able to query out timestamp and individual cell data as in the examples above?  If the answer
is no I believe it will be relatively easy to remove those parts of the interface and make
this REST implementation match the current REST implementation.  Though the ability to query
out cells by identifier and cells by timestamp will be lost.  Though I do not believe this
functionality is available in the current rest implementation.

If the answer is yes, we want to query in the /table/row/column/timestamp fashion, this is
the reason that the directives (and when I say directive I mean things such as fetching region
data or using a scanner) were moved into a query string.  Now if we wanted to keep this interface
and allow for querying with the directives in the path I believe that the logic that would
be required could make the code much more complex than it already is and harder to maintain.
 And for what it's worth I don't feel it's the most straight forward implementation as it
currently stands.  

Adding additional complexity to the path I feel would make the harder to maintain and add
too.  Where as putting these parameters in a query string, I feel, simplifies the addition
of future code.

To address Tom's questions: 

What advantage does this provide besides the perception of being more restful?

Again, I'm not sure I have the full answer for this.  I chose this implementation for selfish
reason outlined below.  And I'm not really sure if the ability to query cells by identifier/timestamp
is something that is truly necessary for HBase.  This is one of the questions I'm hoping someone
who has been working on the project can answer.

The reason I initially chose to start working on this implementation of the REST interface
from the patches in issues 814 and 815 was that I felt it would be easier to separate the
parsing/serialization code out of this version.  I also felt that more modification would
need to be done to the current interface to allow JSON to be sent using it than this implementation
would take to send xml from it.  

I did not fully understand exactly how items were being retrieved out of the interface until
I was some way into the project and began to notice the differences in the interface.  

If the proposed tablename/[row]/[cols]/[timestamp] interface is adopted, how do you GET/PUT/POST/DELETE

>From my notes:

creating a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - http://localhost:60050/TEST16?action=newscanner

//TODO fix up the scanner filters.



Using a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - "http://localhost:60050/TEST16?action=scan&scannerid=<scannerID>&numrows=<num
rows to return>"

//TODO scanner action to return all rows between 2 row ID's

Closing a scanner
curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T - "http://localhost:60050/TEST16?action=closescanner&scannerid=<scannerId>"

In short, a scanner is a stateful resource (like a table) - not an action. The proposed model
means that a table cannot have any "child resources" - just rows. So you could potentially
make a scanner a root-level type, and make an interface like scanner/[id]/[opts]
So you'd POST scanner/?table=myTable&cols=....
then GET scanner/[id]

because the proposed table interface leaves no room for table/scanner/ - scanner would be
interpreted as a row ID.

I, for one, thought the old interface worked well because it allowed one to access different
resource on a given table. Given, 'enable' and 'disable' are actions, not resources.

I believe these issues are addressed above.  I will say that putting a directive as the first
item in the path is possible, though it will always need to be there. 

Think about what other resources might be added to the interface (i.e. maybe MapReduce jobs,
Pig jobs, etc) - would those be resources of a specific table, or root-level types? If you
adopt the tablename/rowID/cols interface, it leaves no room for child resources other than

Perhaps stack or someone can comment on this further, but it seems with the paradigm of HBase
and how a column store database works I have trouble thinking of a case where you were trying
to query the datababase and it didn't go from /table/row  Though I could see possible changes
further down the path from there.

Also as far as PIG or MapReduce jobs go.... I believe implementing these interfaces will be
taken care of by their respective groups.  It's probably best to stick with what works for
HBase and let the other projects decide what's best for them.

> HBase REST xml/json improvements
> --------------------------------
>                 Key: HBASE-1064
>                 URL: https://issues.apache.org/jira/browse/HBASE-1064
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: rest
>            Reporter: Brian Beggs
>         Attachments: json2.jar, RESTPatch-pass1.patch
> I've begun work on creating a REST based interface for HBase that can use both JSON and
XML and would be extensible enough to add new formats down the road.  I'm at a point with
this where I would like to submit it for review and to get feedback as I continue to work
towards new features.
> Attached to this issue you will find the patch for the changes to this point along with
a necessary jar file for the JSON serialization.  Also below you will find my notes on how
to use what is finished with the interface to this point.
> This patch is based off of jira issues: 
> HBASE-814 and HBASE-815
> I am interested on gaining feedback on:
> -what you guys think works
> -what doesn't work for the project
> -anything that may need to be added
> -code style
> -anything else...
> Finished components:
> -framework around parsing json/xml input
> -framework around serialzing xml/json output
> -changes to exception handing
> -changes to the response object to better handle the serializing of output data
> -table CRUD calls
> -Full table fetching
> -creating/fetching scanners
> -fix up the filtering with scanners
> -row insert/delete operations
> -individual row fetching
> -cell fetching interface
> -scanner use interface
> Here are the wiki(ish) notes for what is done to this point:
> REST Service for HBASE Notes:
> GET / 
> -retrieves a list of all the tables with their meta data in HBase
> curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/
> curl -v -H "Accept: application/json" -X GET -T - http://localhost:60050/
> POST / 
> -Create a table
> curl -H "Content-Type: text/xml" -H "Accept: text/xml" -v -X POST -T - http://localhost:60050/newTable
> <table>
>   <name>test14</name>
>   <columnfamilies>
>     <columnfamily>
>       <name>subscription</name>
>       <max-versions>2</max-versions>
>       <compression>NONE</compression>
>       <in-memory>false</in-memory>
>       <block-cache>true</block-cache>
>     </columnfamily>
>   </columnfamilies>
> </table>
> Response:
> <status><code>200</code><message>success</message></status>
> curl -H "Content-Type: application/json" -H "Accept: application/json" -v -X POST -T
- http://localhost:60050/newTable
> {"name":"test5", "column_families":[{
>              "name":"columnfam1",
>              "bloomfilter":true,
>              "time_to_live":10,
>              "in_memory":false,
>              "max_versions":2,
>              "compression":"", 
>              "max_value_length":50,
>              "block_cache_enabled":true
>           }
> ]}
> *NOTE* this is an enum defined in class HColumnDescriptor.CompressionType
> GET /[table_name]
> -returns all records for the table
> curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/tablename
> curl -v -H "Accept: application/json" -X GET -T - http://localhost:60050/tablename
> GET /[table_name]
> -Parameter Action 
> 	metadata - returns the metadata for this table.
> 	regions - returns the regions for this table
> curl -v -H "Accept: text/xml" -X GET -T - http://localhost:60050/pricing1?action=metadata
> Update Table
> PUT /[table_name]
> -updates a table 
> curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X PUT -T - http://localhost:60050/pricing1
>   <columnfamilies>
>     <columnfamily>
>       <name>subscription</name>
>       <max-versions>3</max-versions>
>       <compression>NONE</compression>
>       <in-memory>false</in-memory>
>       <block-cache>true</block-cache>
>     </columnfamily>
>     <columnfamily>
>       <name>subscription1</name>
>       <max-versions>3</max-versions>
>       <compression>NONE</compression>
>       <in-memory>false</in-memory>
>       <block-cache>true</block-cache>
>     </columnfamily>
>   </columnfamilies>
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X PUT -T -
> {"column_families":[{
>              "name":"columnfam1",
>              "bloomfilter":true,
>              "time_to_live":10,
>              "in_memory":false,
>              "max_versions":2,
>              "compression":"", 
>              "max_value_length":50,
>              "block_cache_enabled":true
>           }, 
>           {
>              "name":"columnfam2",
>              "bloomfilter":true,
>              "time_to_live":10,
>              "in_memory":false,
>              "max_versions":2,
>              "compression":"", 
>              "max_value_length":50,
>              "block_cache_enabled":true
>           }
> ]}
> Delete Table
> curl -v -H "Content-Type: text/xml" -H "Accept: text/xml" -X DELETE -T - http://localhost:60050/TEST16
> creating a scanner
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T
- http://localhost:60050/TEST16?action=newscanner
> //TODO fix up the scanner filters.
> response:
> xml:
> <scanner>
>   <id>
>     2
>   </id>
> </scanner>
> json:
> {"id":1}
> Using a scanner
> curl -v -H "Content-Type: application/json" -H "Accept: application/json" -X POST -T
- "http://localhost:60050/TEST16?action=scan&scannerId=<scannerID>&numrows=<num
rows to return>"
> This would be my first submission to an open source project of this size, so please,
give it to me rough.  =)
> Thanks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message