manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anupam Bhattacharya <anupam...@gmail.com>
Subject Re: Generic DB Query Formation for ManifoldCF Framework
Date Mon, 09 Apr 2012 12:56:45 GMT
Karl,

I get this error after running in my job due to seed Query.

Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes
around $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)".

And my new Data Query is ::

SELECT T1.id AS $(IDCOLUMN),T1.url AS
$(URLCOLUMN),CONCAT(T1.uri,T1.inserted, T1.summary , T1.text , T1.abstract
, T1.deletiondate, T3.source) AS $(DATACOLUMN) FROM table1 T1
INNER JOIN  table2 T2 ON T1.id = T2.docid
INNER JOIN  table3 T3 ON T2.docid = T3.docid
WHERE T1.deletiondate IS NOT NULL AND T1.id IN $(IDLIST)

Is there a way to integrate SOLR DIH with ManifoldCF ?

Regards
Anupam

On Mon, Apr 9, 2012 at 6:01 PM, Karl Wright <daddywri@gmail.com> wrote:

> Ok.
>
> I don't have details of your schema or your intent, so my advice is
> going to be limited.  However, your seeding query looks reasonable
> given what you provided:
>
> SELECT id AS $(IDCOLUMN) FROM table1 WHERE deletiondate IS NULL
>
> If there is no versioning ability, and every document is to be fetched
> on every crawl, you should leave the versioning query blank.  This is,
> of course, not very efficient.
>
> For the fetch query, remember that you need to construct the required
> columns from your sources.  Those will be $(IDCOLUMN), $(URLCOLUMN),
> and $(DATACOLUMN).  For the data column, you will likely need to
> concatenate several of your database result columns together using
> whatever concat-like operation your database supports.  Also, given
> that you seem to be attempting a three-way join for this query, it's
> going to look something like this:
>
> SELECT t1.id AS $(IDCOLUMN), t1.uri AS $(URLCOLUMN), concat(..., ...,
> ..., ...) AS $(DATACOLUMN) FROM table1 t1, ..., ... WHERE t1.id IN
> $(IDLIST)
>
> You MUST include IN $(IDLIST) clause, and return all the required
> columns, or the query will not work.  Also, there is no support in the
> JDBC connector for metadata at this time, so if you are attempting to
> set up separate metadata fields you will need an enhancement to the
> connector.
>
> Thanks,
> Karl
>
>
>
> On Mon, Apr 9, 2012 at 7:53 AM, Anupam Bhattacharya <anupamb82@gmail.com>
> wrote:
> > I have read the documentation at
> >
> http://incubator.apache.org/connectors/en_US/end-user-documentation.html#jdbcrepository
> >
> > But I feel that there is no specific example given.
> >
> > On Mon, Apr 9, 2012 at 4:16 PM, Karl Wright <daddywri@gmail.com> wrote:
> >>
> >> I'd start by reading the jdbc connector portion of the end user manual.
> >> There is quite a bit of help in there for writing connector queries.
> >>
> >> Let me know if that answers you questions.
> >>
> >> Thanks,
> >> Karl
> >>
> >> Sent from my Windows Phone
> >> ________________________________
> >> From: Anupam Bhattacharya
> >> Sent: 4/9/2012 5:37 AM
> >> To: connectors-user@incubator.apache.org; Karl Wright
> >> Subject: Generic DB Query Formation for ManifoldCF Framework
> >>
> >> I have a database relational query which indexes properly the table
> >> results using DIH. Although when i try to form the seed query using
> >> ManifoldCF in Simple history i can see that it is not fetching results
> and
> >> thus terminating within few minutes.
> >>
> >> SELECT id AS lcf__id FROM table1 WHERE deletiondate IS NULL;
>  {arguments =
> >> ()}
> >> SELECT T1.id AS lcf__id,T1.uri AS lcf__data, T1.inserted AS lcf__data,
> >> T1.summary AS lcf__data, T1.text AS lcf__data, T1.abstract AS lcf__data,
> >> T1.deletiondate AS lcf__data, T2.category AS lcf__data, T3.source FROM
> >> table1 T1 INNER JOIN table2 T2 ON T1.id = T2.docid INNER JOIN table3 T3
> ON
> >> T2.docid = T3.docid WHERE T1.deletiondate IS NOT NULL AND T1.id IN (?);
> >> {arguments = ('id')}
> >>
> >> What should be the corresponding query for seeding ?
> >> Sample : SELECT idfield AS $(IDCOLUMN) FROM documenttable WHERE
> >> modifydatefield > $(STARTTIME) AND modifydatefield <= $(ENDTIME)
> >> Current : SELECT id AS $(IDCOLUMN) FROM table1 WHERE deletiondate IS
> NULL
> >>
> >> Here id is unique column value.
> >>
> >> Data Query:
> >> Sample : SELECT idfield AS $(IDCOLUMN), urlfield AS $(URLCOLUMN),
> >> datafield AS $(DATACOLUMN) FROM documenttable
> >> WHERE idfield IN $(IDLIST)
> >> Current:
> >> SELECT T1.id AS $(IDCOLUMN),T1.uri AS $(DATACOLUMN), T1.inserted AS
> >> $(DATACOLUMN), T1.url AS $(URLCOLUMN), T1.summary AS $(DATACOLUMN),
> T1.text
> >> AS $(DATACOLUMN), T1.abstract AS $(DATACOLUMN), T1.deletiondate AS
> >> $(DATACOLUMN),
> >> T2.category, T3.source FROM table2 T1
> >> INNER JOIN  table2 T2 ON T1.id = T2.docid
> >> INNER JOIN  table3 T3 ON T2.docid = T3.docid
> >> WHERE T1.deletiondate IS NOT NULL AND T1.id IN $(IDLIST)
> >>
> >> For my scenario
> >> IDCOLUMN = T1.id
> >> VERSIONCOLUMN = The records are not versonable.
> >> URLCOLUMN = ? How to refer a record in a database table with a URL ?
> >> DATACOLUMN = Many data columns are present (T1.id, T1.uri, T1.inserted,
> >> T1.url, T1.summary, T1.text, T1.abstract,
> >> T1.deletiondate, T2.category, T3.source)
> >> STARTTIME = Don't Maintain modification date.
> >> ENDTIME =  Don't Maintain modification date.
> >> IDLIST = This should be the list of all IDs from the seed query.
> >>
> >> What should be the Seed & Data query for my Overall query mentioned
> below
> >> ?
> >> SELECT T1.id, T1.uri, T1.inserted, T1.url, T1.summary, T1.text,
> >> T1.abstract, T1.deletiondate, T2.category, T3.source FROM table1 T1
> >> INNER JOIN table2 T2 ON T1.id = T2.docid
> >> INNER JOIN table3 T3 ON T2.docid = T3.docid
> >> WHERE T1.deletiondate IS NOT NULL
> >>
> >> Thanks in advance for any help on this.
> >>
> >> Regards
> >> Anupam
> >>
> >
> >
> >
> > --
> >
> >
>

Mime
View raw message