manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Generic DB Query Formation for ManifoldCF Framework
Date Mon, 09 Apr 2012 12:31:00 GMT
Ok.

I don't have details of your schema or your intent, so my advice is
going to be limited.  However, your seeding query looks reasonable
given what you provided:

SELECT id AS $(IDCOLUMN) FROM table1 WHERE deletiondate IS NULL

If there is no versioning ability, and every document is to be fetched
on every crawl, you should leave the versioning query blank.  This is,
of course, not very efficient.

For the fetch query, remember that you need to construct the required
columns from your sources.  Those will be $(IDCOLUMN), $(URLCOLUMN),
and $(DATACOLUMN).  For the data column, you will likely need to
concatenate several of your database result columns together using
whatever concat-like operation your database supports.  Also, given
that you seem to be attempting a three-way join for this query, it's
going to look something like this:

SELECT t1.id AS $(IDCOLUMN), t1.uri AS $(URLCOLUMN), concat(..., ...,
..., ...) AS $(DATACOLUMN) FROM table1 t1, ..., ... WHERE t1.id IN
$(IDLIST)

You MUST include IN $(IDLIST) clause, and return all the required
columns, or the query will not work.  Also, there is no support in the
JDBC connector for metadata at this time, so if you are attempting to
set up separate metadata fields you will need an enhancement to the
connector.

Thanks,
Karl



On Mon, Apr 9, 2012 at 7:53 AM, Anupam Bhattacharya <anupamb82@gmail.com> wrote:
> I have read the documentation at
> http://incubator.apache.org/connectors/en_US/end-user-documentation.html#jdbcrepository
>
> But I feel that there is no specific example given.
>
> On Mon, Apr 9, 2012 at 4:16 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>> I'd start by reading the jdbc connector portion of the end user manual.
>> There is quite a bit of help in there for writing connector queries.
>>
>> Let me know if that answers you questions.
>>
>> Thanks,
>> Karl
>>
>> Sent from my Windows Phone
>> ________________________________
>> From: Anupam Bhattacharya
>> Sent: 4/9/2012 5:37 AM
>> To: connectors-user@incubator.apache.org; Karl Wright
>> Subject: Generic DB Query Formation for ManifoldCF Framework
>>
>> I have a database relational query which indexes properly the table
>> results using DIH. Although when i try to form the seed query using
>> ManifoldCF in Simple history i can see that it is not fetching results and
>> thus terminating within few minutes.
>>
>> SELECT id AS lcf__id FROM table1 WHERE deletiondate IS NULL;  {arguments =
>> ()}
>> SELECT T1.id AS lcf__id,T1.uri AS lcf__data, T1.inserted AS lcf__data,
>> T1.summary AS lcf__data, T1.text AS lcf__data, T1.abstract AS lcf__data,
>> T1.deletiondate AS lcf__data, T2.category AS lcf__data, T3.source FROM
>> table1 T1 INNER JOIN table2 T2 ON T1.id = T2.docid INNER JOIN table3 T3 ON
>> T2.docid = T3.docid WHERE T1.deletiondate IS NOT NULL AND T1.id IN (?);
>> {arguments = ('id')}
>>
>> What should be the corresponding query for seeding ?
>> Sample : SELECT idfield AS $(IDCOLUMN) FROM documenttable WHERE
>> modifydatefield > $(STARTTIME) AND modifydatefield <= $(ENDTIME)
>> Current : SELECT id AS $(IDCOLUMN) FROM table1 WHERE deletiondate IS NULL
>>
>> Here id is unique column value.
>>
>> Data Query:
>> Sample : SELECT idfield AS $(IDCOLUMN), urlfield AS $(URLCOLUMN),
>> datafield AS $(DATACOLUMN) FROM documenttable
>> WHERE idfield IN $(IDLIST)
>> Current:
>> SELECT T1.id AS $(IDCOLUMN),T1.uri AS $(DATACOLUMN), T1.inserted AS
>> $(DATACOLUMN), T1.url AS $(URLCOLUMN), T1.summary AS $(DATACOLUMN), T1.text
>> AS $(DATACOLUMN), T1.abstract AS $(DATACOLUMN), T1.deletiondate AS
>> $(DATACOLUMN),
>> T2.category, T3.source FROM table2 T1
>> INNER JOIN  table2 T2 ON T1.id = T2.docid
>> INNER JOIN  table3 T3 ON T2.docid = T3.docid
>> WHERE T1.deletiondate IS NOT NULL AND T1.id IN $(IDLIST)
>>
>> For my scenario
>> IDCOLUMN = T1.id
>> VERSIONCOLUMN = The records are not versonable.
>> URLCOLUMN = ? How to refer a record in a database table with a URL ?
>> DATACOLUMN = Many data columns are present (T1.id, T1.uri, T1.inserted,
>> T1.url, T1.summary, T1.text, T1.abstract,
>> T1.deletiondate, T2.category, T3.source)
>> STARTTIME = Don't Maintain modification date.
>> ENDTIME =  Don't Maintain modification date.
>> IDLIST = This should be the list of all IDs from the seed query.
>>
>> What should be the Seed & Data query for my Overall query mentioned below
>> ?
>> SELECT T1.id, T1.uri, T1.inserted, T1.url, T1.summary, T1.text,
>> T1.abstract, T1.deletiondate, T2.category, T3.source FROM table1 T1
>> INNER JOIN table2 T2 ON T1.id = T2.docid
>> INNER JOIN table3 T3 ON T2.docid = T3.docid
>> WHERE T1.deletiondate IS NOT NULL
>>
>> Thanks in advance for any help on this.
>>
>> Regards
>> Anupam
>>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Mime
View raw message