manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: Solr Extracting request handler
Date Mon, 16 Jun 2014 14:57:55 GMT
As the brilliant engineer that has preceded me wrote, it was a design
choice.
In my opinion this is a strong limitation as I would prefer to delegate the
extraction task to an intermediate processor instead of relying on Solr.
Furthermore I don't like to have to send all the metadata in the header (
and this can cause problems in the header size accepted from the server as
well if we have too much metadata extracted) .

Cheers




2014-06-16 15:51 GMT+01:00 Antonio David Perez Morales <aperez@zaizi.com>:

> Hi Matteo
>
> Manifold already handles the extraction, but the only way to send binary
> content and document metadata to Solr is using the update/extract handler,
> where the metadata is sent as query parameters and the binary content is
> sent in the body of the requests, allowing Solr to use Tika to obtain the
> raw content to be stored in Solr.
>
> Regards
>
>
> On Mon, Jun 16, 2014 at 4:35 PM, Matteo Grolla <m.grolla@sourcesense.com>
> wrote:
>
> > Hi During my first indexing I noticed that manifold uses Solr extracting
> > request handler to extract the content of an xml file
> > For performance reasons it would be better if Manifold handled the
> > extraction letting Solr do the search engine
> > Is this because of the connector design, framework design or just to be
> > done?
> >
> > --
> > Matteo Grolla
> > Sourcesense - making sense of Open Source
> > http://www.sourcesense.com
> >
> >
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.
>



-- 
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message