axis-c-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dimuthu Gamage" <>
Subject REST URL Mapping - The changes done in pattern matching Algorithm
Date Thu, 20 Nov 2008 04:31:45 GMT
Hi devs,

As  you may have already noticed, I changed the REST URL pattern matching
algorithm while fixing the
issue and the WSF/PHP issue, Since
Apache's principle is to 'Commit first' I committed the code first (after
making sure everything works), decided to start a discussion on that :)

Basically What I have done is changing the code
1. Initializing the REST Mapping per operation  -
axis2_svc_get_rest_op_list_with_method_and_location ( in
2. REST dispaching algorithm to find the operation and the matching
patterns  -  axis2_rest_disp_find_op (in src/core/engine/rest_disp.c )

At the initializing face It store the url pattern in a internal recursive
structure call 'axutil_core_utils_map_internal_t' whcih is only used inside
the core_utils.c function. And it store this structure for each operation in
the svc->op_rest_map hash. Here is the structure of the struct.

/* internal structure to keep the rest map in a multi level hash */
typedef struct {
    /* the structure will keep as many as following fields */

    /* if the mapped value is directly the operation */
    axis2_op_t *op_desc;

    /* if the mapped value is a constant, this keeps a hash map of
    possible constants => corrosponding map_internal structure */
    axutil_hash_t *consts_map;

    /* if the mapped value is a param, this keeps a hash map of
    possible param_values => corrosponding_map_internal structre */
    axutil_hash_t *params_map;

} axutil_core_utils_map_internal_t;

For an example think you have  operations with following REST mapping (Using
the example attached in the and )

GET students
GET students/{stduent_id}
GET students/{student_id}/marks/{subject_id}

Then thes svc->op_rest_map will be like

svc->op_rest_map  (hash)
            "GET:students" --------- axutil_core_utils_map_internal_t
                                                        op_desc (axis2_op_t*
for "GET students" op)
                                                        consts_map (empty
                                                        params_map (hash)

"{student_id}" ------------- axutil_core_utils_map_internal_t (instance)


                    op_desc (axis2_op_t* for "GET students/{student_id}" op)


                    parms_map (empty hash)


                     const_map (hash)


                              "marks" -------------------
axutil_core_utils_map_internal_t (instance)


                                                        op_desc (NULL)


                                                       consts_map (empty


                                                       params_map (hash)


----------- axutil_core_utils_map_internal_t (instance)


                                                        op_desc (axis2_op_t*
for "GET students/{student_id}/marks/{subject_id}" op)


consts_map / params_map (Both NULL)


So at the Dispatching URL we can clearly sort out the operation and the
parameter values.

For matching patterns like {student_id}, {subject_id} and more complex
matching like xx{p}yy{q}zz{r} the logic uses the function
axis2_core_utils_match_url_component_with_pattern ( in
This is O(n^2) order (in worse case) algorithm.

And what I want to discuss is the point whether the above described logic is
better than the early logic which sequentilly checks all the mapping.

Calculating the approximate time complexity takng n =numbe of operations, p
=  average URL mapping length
The early logic  = n/2 (<--go through all the operation in average)  *
O(p^2) (<--the complexity of
axis2_core_utils_match_url_component_with_pattern) =>O (n*p^2)

For the second logic it is really complex to do a methematical analysis of
the tiime complexity. Assuming (being optimistic:) average length of the url
component is d (d <= p) and the average parameters in a url is k (k <=p/d)
and taking hash search is O(1)

The new logic = log(n) (<-- long n number of hash search) *(k*O(d^2)) (<--
assuming k parameters have to execute the function
axis2_core_utils_match_url_component_with_pattern)  => O(logn * k* d^2)

Clearly O(logn *k *d ^2)  < O(n*p^2)   (taking logn < n and d <=p and k

Anyway the new logic contain some recursive functions and hash functions
heavily which may slow down things at little n (number of operations),
littld k (little number of url components) and little d (small length urls).

So my question is do we need to favor on the smal number of operation so go
back to old logic(after correcting bugs) or do we stay with the current
logic?. Your opinions?

Dimuthu Gamage

View raw message