manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chalitha udara Perera <chalithaud...@gmail.com>
Subject Re: Repository document stream empty after Tika Transformation
Date Fri, 17 Jul 2015 12:46:41 GMT
Hi Karl,

I'm using 2.1 release  and I am using only the Solr output connector. If
you look at the inputstream size (
   document.getBinaryLength()) after tika connector it is zero.

Thanks,
Chalitha

On Fri, Jul 17, 2015 at 6:08 PM, Karl Wright <daddywri@gmail.com> wrote:

> The document stream contains what tika extracts.  If it can't extract
> anything then you will have an empty stream.
>
> It is also possible that if the stream is split, you are tripping over a
> bug that was fixed some time ago.  What mcf version is this, and do you
> have more than one output?
>
> Karl
>
> Sent from my Windows Phone
> ------------------------------
> From: chalitha udara Perera
> Sent: 7/17/2015 7:25 AM
> To: dev@manifoldcf.apache.org
> Subject: Repository document stream empty after Tika Transformation
>
> Hi All,
>
> I'm writing a transformation connector to extract low level features from
> images. First I used that connector without tika extractor and I worked
> fine. But when I used it with Tika connector (after tika) if fails to
> extract features. After debugging I found out that the stream is empty
> after tika transformation.
> Actually inside tika connector, it creates a new in memory or file stream
> output, but original input stream is never copied to it. Connector should
> reset binary stream after utilizing the stream to get metadata so the
> original inputstream is available from connector to connector.
>
> Here I have attached a simple solution of stream copy and reset that
> worked for me.
>
> Thanks,
> Chalitha
>
> --
> J.M Chalitha Udara Perera
>
> *Department of Computer Science and Engineering,*
> *University of Moratuwa,*
> *Sri Lanka*
>



-- 
J.M Chalitha Udara Perera

*Department of Computer Science and Engineering,*
*University of Moratuwa,*
*Sri Lanka*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message