manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Custom Elasticsearch output connector
Date Tue, 17 Mar 2015 20:16:01 GMT
Hi Maggie,

First, if you want to post to this list, you should sign up for it.
Otherwise I have to moderate your posts.

Second, ManifoldCF has a pretty stringent idea of what a document is for
indexing purposes:

(1) It must have its own URL
(2) It must be indexable as an atomic unit

There is an abstraction on the repository connector side of the framework
which allows for document components.  That is meant to work on precisely
this kind of case, where there is a primary document but you want to index
individual component documents that are taken from the primary document.
It's done this way (rather than on the output connector side) because
otherwise there is no way to come up with the proper individual URL for the
document component.

It would seem to me, therefore, that you may want to consider changing
whatever you are using for a repository connector to be able to handle
compound csv documents.

Thanks,
Karl


On Tue, Mar 17, 2015 at 2:58 PM, Lagos, Maggie [USA] <Lagos_Maggie@bah.com>
wrote:

>
> Hello,
>
> I have a use-case for which the current Elasticsearch output connector
> would not work.
>
> The idea is to use MCF to import CSV files to Elasticsearch. However, each
> row should be indexed as a separate document into Elasticsearch.
>
> It seems this mostly means overriding ElasticSearchIndex.execute() to
> execute an HttpPut per each row and changing IndexRequestEntity.writeTo()
> to write field names and values  based on the CSV headers.
>
> I look forward to your response!
> Maggie
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message