nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Nioche <>
Subject Re: Images, videos and audio
Date Tue, 10 May 2011 12:35:06 GMT
Hi Felipe,

Have you checked that the content is properly fetched?  What do you mean by
Tika is not working?

The metadata returned by Tika will be stored as parse metadata. You will
need to configure which metadata to copy to the crawldb using the param
''. You can then use the urlmeta plugin to index
these as fields.



On 9 May 2011 22:04, Felipe Barriga Richards <> wrote:

> Hi everyone,
> I'm trying to index images (jpeg, exif data), videos and audio (mp3,
> ogg, id3 data) but tika is not working.
> How can I index those files and create the respective fields ?
> Also I don't found how to store the mime type of the files indexed.
> Basically I need to index sites with multimedia.
> Thanks,

*Open Source Solutions for Text Engineering

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message