manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: Other document data.
Date Tue, 15 Jun 2010 10:32:35 GMT
That's hard to determine without looking at the solr logs.  I am not familiar with the log
options available, but unless I'm mistaken the default configuration should be dumping every
request to standard out.

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 5:23 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

Yes I get it. Thanks for the clarification.

I was doing the similar thing before and it used to run, now it didn't. So I got confused.

Is there any way to check if metadata is actually sent to solr? Because I am experiencing
some problem there and I don't seem to figure out where it is going wrong.


Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Tuesday, June 15, 2010 2:13 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

LCF is an incremental crawler.  The version query is used to determine whether data needs
to be refetched and reindexed.  If it returns the same thing each time the document is examined,
the data query will not be run the second time.  I therefore suggest either the following:

(1) Supply no version query at all.  That signals to the connector that there is no version
information and the data must be reindexed on every job run.
(2) Supply a version query that properly reflects changes to the data.  For instance, if there's
a timestamp in each record, you can use that by itself ONLY if any metadata changes also are
associated with a change in that timestamp.  If not, you will need to glom the metadata into
the version string as well as the timestamp.  Is this understood?

If you want to FORCE a reindex, there is a link in the crawler-ui for the output connection
which allows you to force reindexing of all data associated with that connection.

If this still doesn't seem to describe what you are seeing, please clarify further.

Thanks,
Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Tuesday, June 15, 2010 12:51 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

When we specify the metadata content, It runs fine the first time, The second time it doesn't
run the data query at all. What must be the problem ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Sunday, June 13, 2010 7:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

No.  The data query (the same one that returns the blob info) can now include additional columns.
 These columns will be sent to Solr as metadata fields.

Karl

________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Friday, June 11, 2010 2:28 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

Hi,

I see that the issue is resolved.

Now is there a new query where in we can specify the metadata fields ?

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com


-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 4:12 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

It is not possible to properly glom other fields onto a BLOB unless you know that the blob's
contents are always encoded text.  So I suggest you create a jira enhancement request in the
Lucene Connector Framework project to describe this enhancement (adding metadata support to
JDBC connector).

The url is: http://issues.apache.org/jira

You may need to create an account if you don't already have one.  Let me know if you have
any difficulties.

Thanks,
Karl


-----Original Message-----
From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 6:39 AM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.


Hi,

Using solution 1 was not a bad idea, but the problem is the content is stored as BLOB in the
database and gluing other fields with BLOB is not possible (Is it ?) .

Regarding 2 : Yes I guess I can do that modification, and anyway it all depends on how we
show it to the user.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com

-----Original Message-----
From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Thursday, June 10, 2010 3:19 PM
To: connectors-user@incubator.apache.org
Subject: RE: Other document data.

(1) The JDBC connector is currently relatively primitive and does not have any support for
"document metadata" at this time.  You can, of course, glom together multiple fields into
the content field with it, but that's pretty crude.
(2) The LCF convention for how to identify documents uniquely in the target index is to use
the URL of the document.  All documents indexed with LCF have such a URL and it is likely
to be both useful and unique.  This url is how LCF requests deletion of the document from
the index, if necessary, and also overwrites the document.  So it maps pretty precisely to
literal.id for the basic solr setup.  Now, it may be that this is too tied to the example,
and that the solr connector should have a configuration setting to allow the name of the id
field used to be changed - that sounds like a reasonable modification that would not be too
difficult to do.  Is this something you are looking for?

Karl
________________________________________
From: ext Rohan.GPatil@cognizant.com [Rohan.GPatil@cognizant.com]
Sent: Thursday, June 10, 2010 4:52 AM
To: connectors-user@incubator.apache.org
Subject: Other document data.

I am using JDBC connection to search for the documents in the database.

The issue is  some document data(Check in date etc) is present in the other columns. How to
send this data to Solr so as to index it.

Why is the URL of the file taken as ID in Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<mailto:Rohan.GPatil@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.




This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



Mime
View raw message