manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Grolla <>
Subject Re: Solr Extracting request handler
Date Mon, 16 Jun 2014 14:59:40 GMT
Since Solr extracting request handler takes the binary and extracts text
what is the point of not using Manifold extractor and send text and binaries to solr?
I mean the end result is the same solr indexes text and stores text
So if manifold supports text extraction it seems me this is the place where it should be done

Matteo Grolla
Sourcesense - making sense of Open Source

Il giorno 16/giu/2014, alle ore 16:51, Antonio David Perez Morales ha scritto:

> Hi Matteo
> Manifold already handles the extraction, but the only way to send binary
> content and document metadata to Solr is using the update/extract handler,
> where the metadata is sent as query parameters and the binary content is
> sent in the body of the requests, allowing Solr to use Tika to obtain the
> raw content to be stored in Solr.
> Regards
> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <>
> wrote:
>> Hi During my first indexing I noticed that manifold uses Solr extracting
>> request handler to extract the content of an xml file
>> For performance reasons it would be better if Manifold handled the
>> extraction letting Solr do the search engine
>> Is this because of the connector design, framework design or just to be
>> done?
>> --
>> Matteo Grolla
>> Sourcesense - making sense of Open Source
> -- 
> ------------------------------
> This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately. 
> Statements of intent shall only become binding when confirmed in hard copy 
> by an authorised signatory.
> Zaizi Ltd is registered in England and Wales with the registration number 
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
> London W6 7AN. 

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message