couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Young <benja...@couchone.com>
Subject Re: rewriter needed changes
Date Thu, 20 Jan 2011 16:15:55 GMT
On 1/20/11 10:40 AM, Volker Mische wrote:
> On 20.01.2011 16:29, Benjamin Young wrote:
>> On 1/18/11 5:47 PM, Benoit Chesneau wrote:
>>> On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau<bchesneau@gmail.com>
>>> wrote:
>>>> There are 2 tickets open for the rewriter :
>>>>
>>>> https://issues.apache.org/jira/browse/COUCHDB-1017
>>>> https://issues.apache.org/jira/browse/COUCHDB-1005
>>>>
>>>> First one is about testing types of value to eventually encode them
>>>> (or decode) from the path or query string. 1017 speak about strings
>>>> but it could be integer as well. This isn't possible actually.
>>>>
>>>> Second is to have a more enhanced rewriter. First intention of
>>>> _rewriter was to offer a simple way to dispatch urls to a resource
>>>> (_show, _update, _list, _view, doc, attachment) based on path terms
>>>> (string, ':var", "*"). Path specifications are obtained by breaking
>>>> url into tokens via the "/" separator, Then we match them against path
>>>> terms. That's how we find urls. There is also the possibility to use
>>>> query arguments as a path term. A rewriter like this is the easier
>>>> implementation we found, and as is the only that obtained a consensus.
>>>>
>>>> The feature asked in 1005 need more power than simple pattern 
>>>> matching.
>>>>
>>>> The more people will use CouchApps with CouchDB facing directly to the
>>>> web (without any proxy), the more people will ask for such features.
>>>>
>>>> I see 2 alternatives and easy pattern matching we can use to solve
>>>> such problem:
>>>>
>>>>
>>>> 1.
>>>>
>>>> Put var between "<>" like this<key>,
>>>> Then eventually say what is the type of the variable :<int:key> for
>>>> integer.
>>>>
>>>> Ex:
>>>>
>>>> {
>>>> "from": "/a/b/<key>/<int:id>",
>>>> "to":"/c/<key>",
>>>> "query": {
>>>> "key": "<int:key>"
>>>> }
>>>> }
>>>>
>>>> /a/b/c/13 -> /c/c?key=13
>>>>
>>>>
>>>> This solve 1017 and potentially 1005 .
>>>>
>>>> 2. Use mongrel2 pattern matching:
>>>>
>>>> <snip>
>>>> URL patterns always match from the start, routes are broken into
>>>> prefix and pattern part. We uses the routes to find the longest
>>>> matching prefix and then tests the pattern. If the pattern matches,
>>>> then the route works. If the route doesn't have a pattern, then it's
>>>> assumed to match, and you're done.
>>>>
>>>> The only caveat is you have to wrap your pattern parts in parenthesis,
>>>> but these don't mean anything other than to delimit where a pattern
>>>> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
>>>> work.
>>>>
>>>> Here's the list of characters you can use in your patterns:
>>>>
>>>> . (period) All characters.
>>>> \a Letters.
>>>> \c Control characters.
>>>> \d Digits.
>>>> \l Lowercase letters.
>>>> \p Punctuation characters.
>>>> \s Space characters.
>>>> \u Uppercase letters.
>>>> \w Alphanumeric characters.
>>>> \x Hexadecimal digits.
>>>> \z The 0 character (null terminator).
>>>> [set] Just like a regex [] where is a set of chars, like [0-9] for
>>>> all digits.
>>>> [^set] Inverse character set, so [^0-9] is anything but digits.
>>>> ⋆ Longest match of 0 or more of the preceding character.
>>>> + Longest match of 1 or more of the preceding character.
>>>> - Shortest match of 0 or more of the preceding character.
>>>> ? 0 or 1 match of of the preceding character
>>>> \bxy Balanced match a substring starting with x and ending in y. So
>>>> \b() will match balanced parentheses.
>>>> $ End of the string.
>>>> Using the uppercase version of an escaped character makes it work the
>>>> opposite way (i.e., \A matches any character that isn't a letter). The
>>>> backslash can be used to escape the following character, disabling its
>>>> special abilities (i.e., \\ will match a backslash).
>>>>
>>>> Anything that's not listed here is matched literally.
>>>>
>>>> </snip>
>>>>
>>>> This solution is really simple, remove the useless things you have in
>>>> regexp and give complete power to the users. Also this kind of parsing
>>>> is relatively easy to do in erlang.
>>>>
>>>>
>>>> There may be a third solution. If we use something like emonk, erlv8,
>>>> ... we could have the rewriter in a js function. But it won't happend
>>>> in next 6 months . I'm pretty supporter of the second solution though,
>>>> and quite ready to start a new parser.
>>>>
>>>> Any thoughts ?
>>>>
>>>>
>>>> - benoît
>>>>
>>> Since then I started couchapp_legacy :
>>>
>>> https://github.com/benoitc/couchapp_legacy
>>>
>>> It embed a new rewriter doing both reversed and regexp based
>>> dispatching with some other features like :
>>>
>>> - Resource handlers plugin system, actually a rewriter and a proxy
>>> handler.
>>> - Route caching: rules are build only on first access or when the
>>> design doc is changed.
>>>
>>> TODO:
>>> - variable transformations : string -> int for ex
>>>
>>>
>>> There will be other features in couchapp_legacy plugin (current name)
>>> soon. Hope it helps to push the conversation further.
>>>
>>> - benoit
>> Benoit,
>>
>> Thanks for starting this conversation! :) I'd played with building a
>> RegEx-based rewriter for CouchDB, but I'm new to Erlang, so it's no
>> where near production ready. It's great to see someone else has an
>> interest in this piece of the puzzle as well.
>>
>> In the legacy couchapp there's a route that uses an options section to
>> define patterns. It seems like a promising direction for extending the
>> rewriter. I'd like to propose we build something like this:
>>
>> {
>> "method":"GET",
>> "from": "/page/:page",
>> "to": "/_show/post/:page",
>> "params": {
>> "page": {
>> "match": "\\w*",
>> "type": "string"
>> }
>> }
>> }
>>
>> If the parameter appears in the params section, we should use it's
>> "match" rather than that standard (.*) pattern. "type" in that section
>> would refer to the output type. Variables would continue to be
>> represented with the colon notation to keep the URL space clean (vs.
>> using RegEx in the URL as I'd planned to do).
>>
>> One other helpful addition might be an "engine" option to set the
>> matching system to use. I'd prefer using PCRE, you've mentioned Mongrel,
>> someone else might want grep. :)
>>
>> Thanks for starting this discussion, Benoit. I look forward to your
>> thoughts.
>>
>> Later,
>> Benjamin
>
> Benjamin,
>
> this is a quite simple example. Should the rewriter still be based on 
> path, i.e. on slashes as separator (as it currently is), or would also 
> things like this be possible:
>
> {
>   "from": "/page/:x/:y/:z",
>   "to": "/_show/post/:x-:y-:z/something",
>   "params": {
>     "x": {
>     "match": "\\d",
>   },
>     "y": {
>     "match": "\\d",
>   },
>     "z": {
>     "match": "\\d",
>   }
> }
>
> Cheers,
>   Volker
We definitely need top open up URL construction beyond just slashes. We 
may want to consider using a non-reserved character for our variable 
names as well. URI Templates use {var} and past URI related RFC's have 
used <var> around non-path/query related pieces to denote them as samples.



Mime
View raw message