couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Volker Mische <volker.mis...@gmail.com>
Subject Re: rewriter needed changes
Date Thu, 20 Jan 2011 15:40:39 GMT
On 20.01.2011 16:29, Benjamin Young wrote:
> On 1/18/11 5:47 PM, Benoit Chesneau wrote:
>> On Mon, Jan 10, 2011 at 1:32 AM, Benoit Chesneau<bchesneau@gmail.com>
>> wrote:
>>> There are 2 tickets open for the rewriter :
>>>
>>> https://issues.apache.org/jira/browse/COUCHDB-1017
>>> https://issues.apache.org/jira/browse/COUCHDB-1005
>>>
>>> First one is about testing types of value to eventually encode them
>>> (or decode) from the path or query string. 1017 speak about strings
>>> but it could be integer as well. This isn't possible actually.
>>>
>>> Second is to have a more enhanced rewriter. First intention of
>>> _rewriter was to offer a simple way to dispatch urls to a resource
>>> (_show, _update, _list, _view, doc, attachment) based on path terms
>>> (string, ':var", "*"). Path specifications are obtained by breaking
>>> url into tokens via the "/" separator, Then we match them against path
>>> terms. That's how we find urls. There is also the possibility to use
>>> query arguments as a path term. A rewriter like this is the easier
>>> implementation we found, and as is the only that obtained a consensus.
>>>
>>> The feature asked in 1005 need more power than simple pattern matching.
>>>
>>> The more people will use CouchApps with CouchDB facing directly to the
>>> web (without any proxy), the more people will ask for such features.
>>>
>>> I see 2 alternatives and easy pattern matching we can use to solve
>>> such problem:
>>>
>>>
>>> 1.
>>>
>>> Put var between "<>" like this<key>,
>>> Then eventually say what is the type of the variable :<int:key> for
>>> integer.
>>>
>>> Ex:
>>>
>>> {
>>> "from": "/a/b/<key>/<int:id>",
>>> "to":"/c/<key>",
>>> "query": {
>>> "key": "<int:key>"
>>> }
>>> }
>>>
>>> /a/b/c/13 -> /c/c?key=13
>>>
>>>
>>> This solve 1017 and potentially 1005 .
>>>
>>> 2. Use mongrel2 pattern matching:
>>>
>>> <snip>
>>> URL patterns always match from the start, routes are broken into
>>> prefix and pattern part. We uses the routes to find the longest
>>> matching prefix and then tests the pattern. If the pattern matches,
>>> then the route works. If the route doesn't have a pattern, then it's
>>> assumed to match, and you're done.
>>>
>>> The only caveat is you have to wrap your pattern parts in parenthesis,
>>> but these don't mean anything other than to delimit where a pattern
>>> starts. So instead of /images/.⋆.jpg, write /images/(.⋆.jpg) for it to
>>> work.
>>>
>>> Here's the list of characters you can use in your patterns:
>>>
>>> . (period) All characters.
>>> \a Letters.
>>> \c Control characters.
>>> \d Digits.
>>> \l Lowercase letters.
>>> \p Punctuation characters.
>>> \s Space characters.
>>> \u Uppercase letters.
>>> \w Alphanumeric characters.
>>> \x Hexadecimal digits.
>>> \z The 0 character (null terminator).
>>> [set] Just like a regex [] where is a set of chars, like [0-9] for
>>> all digits.
>>> [^set] Inverse character set, so [^0-9] is anything but digits.
>>> ⋆ Longest match of 0 or more of the preceding character.
>>> + Longest match of 1 or more of the preceding character.
>>> - Shortest match of 0 or more of the preceding character.
>>> ? 0 or 1 match of of the preceding character
>>> \bxy Balanced match a substring starting with x and ending in y. So
>>> \b() will match balanced parentheses.
>>> $ End of the string.
>>> Using the uppercase version of an escaped character makes it work the
>>> opposite way (i.e., \A matches any character that isn't a letter). The
>>> backslash can be used to escape the following character, disabling its
>>> special abilities (i.e., \\ will match a backslash).
>>>
>>> Anything that's not listed here is matched literally.
>>>
>>> </snip>
>>>
>>> This solution is really simple, remove the useless things you have in
>>> regexp and give complete power to the users. Also this kind of parsing
>>> is relatively easy to do in erlang.
>>>
>>>
>>> There may be a third solution. If we use something like emonk, erlv8,
>>> ... we could have the rewriter in a js function. But it won't happend
>>> in next 6 months . I'm pretty supporter of the second solution though,
>>> and quite ready to start a new parser.
>>>
>>> Any thoughts ?
>>>
>>>
>>> - benoît
>>>
>> Since then I started couchapp_legacy :
>>
>> https://github.com/benoitc/couchapp_legacy
>>
>> It embed a new rewriter doing both reversed and regexp based
>> dispatching with some other features like :
>>
>> - Resource handlers plugin system, actually a rewriter and a proxy
>> handler.
>> - Route caching: rules are build only on first access or when the
>> design doc is changed.
>>
>> TODO:
>> - variable transformations : string -> int for ex
>>
>>
>> There will be other features in couchapp_legacy plugin (current name)
>> soon. Hope it helps to push the conversation further.
>>
>> - benoit
> Benoit,
>
> Thanks for starting this conversation! :) I'd played with building a
> RegEx-based rewriter for CouchDB, but I'm new to Erlang, so it's no
> where near production ready. It's great to see someone else has an
> interest in this piece of the puzzle as well.
>
> In the legacy couchapp there's a route that uses an options section to
> define patterns. It seems like a promising direction for extending the
> rewriter. I'd like to propose we build something like this:
>
> {
> "method":"GET",
> "from": "/page/:page",
> "to": "/_show/post/:page",
> "params": {
> "page": {
> "match": "\\w*",
> "type": "string"
> }
> }
> }
>
> If the parameter appears in the params section, we should use it's
> "match" rather than that standard (.*) pattern. "type" in that section
> would refer to the output type. Variables would continue to be
> represented with the colon notation to keep the URL space clean (vs.
> using RegEx in the URL as I'd planned to do).
>
> One other helpful addition might be an "engine" option to set the
> matching system to use. I'd prefer using PCRE, you've mentioned Mongrel,
> someone else might want grep. :)
>
> Thanks for starting this discussion, Benoit. I look forward to your
> thoughts.
>
> Later,
> Benjamin

Benjamin,

this is a quite simple example. Should the rewriter still be based on 
path, i.e. on slashes as separator (as it currently is), or would also 
things like this be possible:

{
   "from": "/page/:x/:y/:z",
   "to": "/_show/post/:x-:y-:z/something",
   "params": {
     "x": {
     "match": "\\d",
   },
     "y": {
     "match": "\\d",
   },
     "z": {
     "match": "\\d",
   }
}

Cheers,
   Volker

Mime
View raw message