Github user njayaram2 commented on a diff in the pull request:
https://github.com/apache/incubator-madlib/pull/113#discussion_r110784071
--- Diff: src/ports/postgres/modules/graph/sssp.py_in ---
@@ -137,45 +231,131 @@ def graph_sssp(schema_madlib, vertex_table, vertex_id, edge_table,
<!''!>)
plpy.execute(sql_index)
- # The source can be reached with 0 cost and it has itself as the parent.
- plpy.execute(
- """ INSERT INTO {oldupdate}
- VALUES({source_vertex},0,{source_vertex})
- """.format(**locals()))
+ # The initialization step is quite different when grouping is involved
+ # since not every group (subgraph) will have the same set of vertices.
+
+ # Example:
+ # Assume there are two grouping columns g1 and g2
+ # g1 values are 0 or 1. g2 values are 5 or 6
+ if grouping_cols is not None:
+
+ # Find every distinct group
+ # Example: This will give us (0,5), (0,6), (1,5) and (1,6)
+ groups = plpy.execute(
+ """SELECT array_agg(DISTINCT ({grouping_cols})) AS gsql
+ FROM {edge_table} """.format(**locals()))[0]['gsql']
+
+ # For each group initialize the output table.
+ for i in range(0,len(groups)):
+ gslist = []
+
+ # If we have more than 1 grouping columns,
+ # we have to prepare them induvidually
+ if len(glist) > 1:
+ gi = groups[i][1:-1]
+ gvals = _string_to_array(gi)
+ for j in range(0,len(glist)):
+ gslist.append(glist[j] + " = " + gvals[j])
+
+ # Prepare the sql check for a particular group
+ # Example: gcomp will be "g1 = 0 AND g2 = 5"
+ gcomp = " AND ".join(gslist)
+ else:
+ gi = groups[i]
+ gcomp = glist[0] + " = " + str(gi)
+
+ # Find the vertices that are involved in this group
+ plpy.execute(
+ """ INSERT INTO {out_table}
+ SELECT DISTINCT ON ({vertex_id})
+ {grp_comma} {vertex_id} AS {vertex_id},
+ {init_w} AS {weight},
+ NULL::INT AS parent
+ FROM (
+ SELECT {src} AS {vertex_id} {comma_grp}
+ FROM {edge_table} WHERE {gcomp}
+ UNION
+ SELECT {dest} AS {vertex_id} {comma_grp}
+ FROM {edge_table} WHERE {gcomp}
+ ) x
+ WHERE {vertex_id} IS NOT NULL
--- End diff --
I think it's better to use INNER JOINS instead of a `WHERE` clause.
This query does not work if the grouping columns are not integers.
For instance, if the grouping column is a `VARCHAR`, we will have to use
quoted values. It could get messy to construct the where clause based
on types of the grouping columns.
I created the following example for grouping, based off the user docs
example in SSSP (not sure, but I presume this is a valid input table):
```
DROP TABLE IF EXISTS edge_gr,out_gr,out_gr_summary_out_gr_path;
CREATE TABLE edge_gr AS
(
SELECT *, 'a' AS grp FROM edge
UNION
SELECT *, 'b' AS grp FROM edge WHERE src < 6 AND dest < 6
);
INSERT INTO edge_gr VALUES
(4,5,-20,'b');
```
Running the SSSP algorithm on this table results in an error.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
|