manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <>
Subject Re: Determining document model passed to search engine
Date Mon, 11 Feb 2013 20:17:27 GMT
What emerges from the web connector is the following:

-	metadata, which you define on the web connector’s “Metadata” tab,
that are named however you want;
-	forced acls, which get added to the document based on what you
select on the “Security” tab;
-	the document’s content type;
-	the document’s url;
-	the document itself.

What the elastic search connector does is:
-	Map the document’s url to ElasticSearch’s document id field (which I
guess shows up in Elastic Search as the ‘uri’ field)
-	Output all the metadata directly to ElasticSearch using the name
provided by the repository connector
-	Set the file value to “” (which seems wrong, since that could be
helpful if available - let me know if you think a fix for this would
be useful)
-	NONE of the rest of the document fields (content type, acls, etc)
are communicated to Elastic Search at all right now, except for the
document itself.


On Mon, Feb 11, 2013 at 2:55 PM, Tony Edgin <> wrote:
> Thanks for the speedy response!
> I eventually want to index the contents of our local website with Elastic
> Search.
> I would use the Web repository connector with the no authority connector and
> the Elasticsearch output connector.  Would you mind letting me know the
> names and meanings of the metadata that get's passed to Elastic Search?
> Thanks again.
> On Mon, Feb 11, 2013 at 12:45 PM, Karl Wright <> wrote:
>> So let me get this clear - you are looking to find out what the
>> names/meanings are of the metadata that gets passed to the output
>> connector, for a given repository connection?
>> If this is what you are looking for, I'm afraid that while at one
>> point the end-user documentation described this pretty accurately, it
>> is now significantly out of date.  While it's not terribly hard to
>> compile this information from source code etc., the work definitely
>> needs to be repeated by somebody.
>> If you want to ask this question about a specific connector, I can
>> certainly try to answer it, though.  If you want to contribute either
>> the information or a documentation patch, this would be great too.
>> Karl
>> On Mon, Feb 11, 2013 at 2:38 PM, Tony Edgin <>
>> wrote:
>> > I'm sure this is documented somewhere, and I apologize in advance for
>> > not
>> > being able to find it.
>> >
>> > How do I determine the model or schema of the document passed to the
>> > search
>> > engine by a given job?
>> >
>> > For instance, I'm running a job that crawls a directory on my local file
>> > system and passes to to Elastic Search.  Interrogating Elastic Search, I
>> > can
>> > determine that the document has three fields, "file", "type" and "uri",
>> > all
>> > strings.  How would I have known that in advance?
>> >
>> > Thanks for any help.

View raw message