manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chalitha udara Perera <chalithaud...@gmail.com>
Subject Re: Repository document stream empty after Tika Transformation
Date Fri, 17 Jul 2015 16:09:35 GMT
Hi Karl,

Here I have attached the result from File System -> Tika Transform -> Null
Output.
Please find the attachment.

Thank you,
Chalitha

On Fri, Jul 17, 2015 at 6:41 PM, Karl Wright <daddywri@gmail.com> wrote:

> I don't see this here.
>
> I set up the following:
> - file system repository connection
> - null output connection
> - tika extractor
> - a job using all three
>
> Running the job and looking at the simple history, I see null output
> connection ingestion records that have proper document sizes.
>
> Can you repeat the same setup there, and tell me what you get?
>
> Thanks,
> Karl
>
> Sent from my Windows Phone
> ------------------------------
> From: chalitha udara Perera
> Sent: 7/17/2015 8:46 AM
> To: Karl Wright
> Cc: dev@manifoldcf.apache.org
> Subject: Re: Repository document stream empty after Tika Transformation
>
> Hi Karl,
>
> I'm using 2.1 release  and I am using only the Solr output connector. If
> you look at the inputstream size (
>    document.getBinaryLength()) after tika connector it is zero.
>
> Thanks,
> Chalitha
>
> On Fri, Jul 17, 2015 at 6:08 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> The document stream contains what tika extracts.  If it can't extract
>> anything then you will have an empty stream.
>>
>> It is also possible that if the stream is split, you are tripping over a
>> bug that was fixed some time ago.  What mcf version is this, and do you
>> have more than one output?
>>
>> Karl
>>
>> Sent from my Windows Phone
>> ------------------------------
>> From: chalitha udara Perera
>> Sent: 7/17/2015 7:25 AM
>> To: dev@manifoldcf.apache.org
>> Subject: Repository document stream empty after Tika Transformation
>>
>> Hi All,
>>
>> I'm writing a transformation connector to extract low level features from
>> images. First I used that connector without tika extractor and I worked
>> fine. But when I used it with Tika connector (after tika) if fails to
>> extract features. After debugging I found out that the stream is empty
>> after tika transformation.
>> Actually inside tika connector, it creates a new in memory or file stream
>> output, but original input stream is never copied to it. Connector should
>> reset binary stream after utilizing the stream to get metadata so the
>> original inputstream is available from connector to connector.
>>
>> Here I have attached a simple solution of stream copy and reset that
>> worked for me.
>>
>> Thanks,
>> Chalitha
>>
>> --
>> J.M Chalitha Udara Perera
>>
>> *Department of Computer Science and Engineering,*
>> *University of Moratuwa,*
>> *Sri Lanka*
>>
>
>
>
> --
> J.M Chalitha Udara Perera
>
> *Department of Computer Science and Engineering,*
> *University of Moratuwa,*
> *Sri Lanka*
>



-- 
J.M Chalitha Udara Perera

*Department of Computer Science and Engineering,*
*University of Moratuwa,*
*Sri Lanka*

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message