manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anupam Bhattacharya <anupam...@gmail.com>
Subject Generic DB Query Formation for ManifoldCF Framework
Date Mon, 09 Apr 2012 09:37:26 GMT
I have a database relational query which indexes properly the table results
using DIH. Although when i try to form the seed query using ManifoldCF in
Simple history i can see that it is not fetching results and thus
terminating within few minutes.

SELECT id AS lcf__id FROM table1 WHERE deletiondate IS NULL;  {arguments =
()}
SELECT T1.id AS lcf__id,T1.uri AS lcf__data, T1.inserted AS lcf__data,
T1.summary AS lcf__data, T1.text AS lcf__data, T1.abstract AS lcf__data,
T1.deletiondate AS lcf__data, T2.category AS lcf__data, T3.source FROM
table1 T1 INNER JOIN table2 T2 ON T1.id = T2.docid INNER JOIN table3 T3 ON
T2.docid = T3.docid WHERE T1.deletiondate IS NOT NULL AND T1.id IN (?);
{arguments = ('id')}

What should be the corresponding query for seeding ?
Sample : SELECT idfield AS $(IDCOLUMN) FROM documenttable WHERE
modifydatefield > $(STARTTIME) AND modifydatefield <= $(ENDTIME)
Current : SELECT id AS $(IDCOLUMN) FROM table1 WHERE deletiondate IS NULL

Here id is unique column value.

Data Query:
Sample : SELECT idfield AS $(IDCOLUMN), urlfield AS $(URLCOLUMN), datafield
AS $(DATACOLUMN) FROM documenttable
WHERE idfield IN $(IDLIST)
Current:
SELECT T1.id AS $(IDCOLUMN),T1.uri AS $(DATACOLUMN), T1.inserted AS
$(DATACOLUMN), T1.url AS $(URLCOLUMN), T1.summary AS $(DATACOLUMN), T1.text
AS $(DATACOLUMN), T1.abstract AS $(DATACOLUMN), T1.deletiondate AS
$(DATACOLUMN),
T2.category, T3.source FROM table2 T1
INNER JOIN  table2 T2 ON T1.id = T2.docid
INNER JOIN  table3 T3 ON T2.docid = T3.docid
WHERE T1.deletiondate IS NOT NULL AND T1.id IN $(IDLIST)

For my scenario
IDCOLUMN = T1.id
VERSIONCOLUMN = The records are not versonable.
URLCOLUMN = ? How to refer a record in a database table with a URL ?
DATACOLUMN = Many data columns are present (*T1.id, T1.uri, T1.inserted,
T1.url, T1.summary, T1.text, T1.abstract,
T1.deletiondate, T2.category, T3.source*)
STARTTIME = Don't Maintain modification date.
ENDTIME =  Don't Maintain modification date.
IDLIST = This should be the list of all IDs from the seed query.

What should be the Seed & Data query for my Overall query mentioned below ?
*SELECT T1.id, T1.uri, T1.inserted, T1.url, T1.summary, T1.text,
T1.abstract, T1.deletiondate, T2.category, T3.source FROM table1 T1 *
*INNER JOIN table2 T2 ON T1.id = T2.docid *
*INNER JOIN table3 T3 ON T2.docid = T3.docid *
*WHERE T1.deletiondate IS NOT NULL*
*
*
Thanks in advance for any help on this.

Regards
Anupam

Mime
View raw message