manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Generic DB Query Formation for ManifoldCF Framework
Date Mon, 09 Apr 2012 13:00:45 GMT
Hi Anupam,

Try doing what the error message suggests.

There are no current plans that I am aware of to integrate Solr DIH
with ManifoldCF.  Open source software development is usually driven
by need and all of us are volunteers.  If there is a major feature
that you want to see, then I suggest you consider contributing that
feature.  We are generally very open to contributions of all kinds.

Thanks,
Karl

On Mon, Apr 9, 2012 at 8:56 AM, Anupam Bhattacharya <anupamb82@gmail.com> wrote:
> Karl,
>
> I get this error after running in my job due to seed Query.
>
> Error: Bad seed query; doesn't return $(IDCOLUMN) column. Try using quotes
> around $(IDCOLUMN) variable, e.g. "$(IDCOLUMN)".
>
> And my new Data Query is ::
>
> SELECT T1.id AS $(IDCOLUMN),T1.url AS
> $(URLCOLUMN),CONCAT(T1.uri,T1.inserted, T1.summary , T1.text , T1.abstract ,
> T1.deletiondate, T3.source) AS $(DATACOLUMN) FROM table1 T1
> INNER JOIN  table2 T2 ON T1.id = T2.docid
> INNER JOIN  table3 T3 ON T2.docid = T3.docid
> WHERE T1.deletiondate IS NOT NULL AND T1.id IN $(IDLIST)
>
> Is there a way to integrate SOLR DIH with ManifoldCF ?
>
> Regards
> Anupam
>
> On Mon, Apr 9, 2012 at 6:01 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>> Ok.
>>
>> I don't have details of your schema or your intent, so my advice is
>> going to be limited.  However, your seeding query looks reasonable
>> given what you provided:
>>
>> SELECT id AS $(IDCOLUMN) FROM table1 WHERE deletiondate IS NULL
>>
>> If there is no versioning ability, and every document is to be fetched
>> on every crawl, you should leave the versioning query blank.  This is,
>> of course, not very efficient.
>>
>> For the fetch query, remember that you need to construct the required
>> columns from your sources.  Those will be $(IDCOLUMN), $(URLCOLUMN),
>> and $(DATACOLUMN).  For the data column, you will likely need to
>> concatenate several of your database result columns together using
>> whatever concat-like operation your database supports.  Also, given
>> that you seem to be attempting a three-way join for this query, it's
>> going to look something like this:
>>
>> SELECT t1.id AS $(IDCOLUMN), t1.uri AS $(URLCOLUMN), concat(..., ...,
>> ..., ...) AS $(DATACOLUMN) FROM table1 t1, ..., ... WHERE t1.id IN
>> $(IDLIST)
>>
>> You MUST include IN $(IDLIST) clause, and return all the required
>> columns, or the query will not work.  Also, there is no support in the
>> JDBC connector for metadata at this time, so if you are attempting to
>> set up separate metadata fields you will need an enhancement to the
>> connector.
>>
>> Thanks,
>> Karl
>>
>>
>>
>> On Mon, Apr 9, 2012 at 7:53 AM, Anupam Bhattacharya <anupamb82@gmail.com>
>> wrote:
>> > I have read the documentation at
>> >
>> > http://incubator.apache.org/connectors/en_US/end-user-documentation.html#jdbcrepository
>> >
>> > But I feel that there is no specific example given.
>> >
>> > On Mon, Apr 9, 2012 at 4:16 PM, Karl Wright <daddywri@gmail.com> wrote:
>> >>
>> >> I'd start by reading the jdbc connector portion of the end user manual.
>> >> There is quite a bit of help in there for writing connector queries.
>> >>
>> >> Let me know if that answers you questions.
>> >>
>> >> Thanks,
>> >> Karl
>> >>
>> >> Sent from my Windows Phone
>> >> ________________________________
>> >> From: Anupam Bhattacharya
>> >> Sent: 4/9/2012 5:37 AM
>> >> To: connectors-user@incubator.apache.org; Karl Wright
>> >> Subject: Generic DB Query Formation for ManifoldCF Framework
>> >>
>> >> I have a database relational query which indexes properly the table
>> >> results using DIH. Although when i try to form the seed query using
>> >> ManifoldCF in Simple history i can see that it is not fetching results
>> >> and
>> >> thus terminating within few minutes.
>> >>
>> >> SELECT id AS lcf__id FROM table1 WHERE deletiondate IS NULL;
>> >>  {arguments =
>> >> ()}
>> >> SELECT T1.id AS lcf__id,T1.uri AS lcf__data, T1.inserted AS lcf__data,
>> >> T1.summary AS lcf__data, T1.text AS lcf__data, T1.abstract AS
>> >> lcf__data,
>> >> T1.deletiondate AS lcf__data, T2.category AS lcf__data, T3.source FROM
>> >> table1 T1 INNER JOIN table2 T2 ON T1.id = T2.docid INNER JOIN table3 T3
>> >> ON
>> >> T2.docid = T3.docid WHERE T1.deletiondate IS NOT NULL AND T1.id IN (?);
>> >> {arguments = ('id')}
>> >>
>> >> What should be the corresponding query for seeding ?
>> >> Sample : SELECT idfield AS $(IDCOLUMN) FROM documenttable WHERE
>> >> modifydatefield > $(STARTTIME) AND modifydatefield <= $(ENDTIME)
>> >> Current : SELECT id AS $(IDCOLUMN) FROM table1 WHERE deletiondate IS
>> >> NULL
>> >>
>> >> Here id is unique column value.
>> >>
>> >> Data Query:
>> >> Sample : SELECT idfield AS $(IDCOLUMN), urlfield AS $(URLCOLUMN),
>> >> datafield AS $(DATACOLUMN) FROM documenttable
>> >> WHERE idfield IN $(IDLIST)
>> >> Current:
>> >> SELECT T1.id AS $(IDCOLUMN),T1.uri AS $(DATACOLUMN), T1.inserted AS
>> >> $(DATACOLUMN), T1.url AS $(URLCOLUMN), T1.summary AS $(DATACOLUMN),
>> >> T1.text
>> >> AS $(DATACOLUMN), T1.abstract AS $(DATACOLUMN), T1.deletiondate AS
>> >> $(DATACOLUMN),
>> >> T2.category, T3.source FROM table2 T1
>> >> INNER JOIN  table2 T2 ON T1.id = T2.docid
>> >> INNER JOIN  table3 T3 ON T2.docid = T3.docid
>> >> WHERE T1.deletiondate IS NOT NULL AND T1.id IN $(IDLIST)
>> >>
>> >> For my scenario
>> >> IDCOLUMN = T1.id
>> >> VERSIONCOLUMN = The records are not versonable.
>> >> URLCOLUMN = ? How to refer a record in a database table with a URL ?
>> >> DATACOLUMN = Many data columns are present (T1.id, T1.uri, T1.inserted,
>> >> T1.url, T1.summary, T1.text, T1.abstract,
>> >> T1.deletiondate, T2.category, T3.source)
>> >> STARTTIME = Don't Maintain modification date.
>> >> ENDTIME =  Don't Maintain modification date.
>> >> IDLIST = This should be the list of all IDs from the seed query.
>> >>
>> >> What should be the Seed & Data query for my Overall query mentioned
>> >> below
>> >> ?
>> >> SELECT T1.id, T1.uri, T1.inserted, T1.url, T1.summary, T1.text,
>> >> T1.abstract, T1.deletiondate, T2.category, T3.source FROM table1 T1
>> >> INNER JOIN table2 T2 ON T1.id = T2.docid
>> >> INNER JOIN table3 T3 ON T2.docid = T3.docid
>> >> WHERE T1.deletiondate IS NOT NULL
>> >>
>> >> Thanks in advance for any help on this.
>> >>
>> >> Regards
>> >> Anupam
>> >>
>> >
>> >
>> >
>> > --
>> >
>> >
>
>
>

Mime
View raw message