nutch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ayyanar Inbamohan <technical_ema...@yahoo.com>
Subject Re: nutch 7.0 not fetching powerpoint, plugin is present
Date Thu, 08 Sep 2005 05:23:30 GMT
Hi All,


As you all said,
1. i have added the powerpoint to mime type, 
2. in the nutch-default.xml also i have added the
powerpoint plugin in the plugins list
3. in plugin.xml also i have added the content-type as
application/powerpoint

but still i am getting the problem


050908 105407 fetching
http://localhost:8080/search_sample/kmportal3.ppt
050908 105407 fetching
http://localhost:8080/search_sample/testpdf.pdf
050908 105407 fetching
http://localhost:8080/search_sample/kmportal10.ppt
050908 105407 fetching
http://localhost:8080/search_sample/kmportal2.ppt
050908 105407 fetching
http://localhost:8080/search_sample/kmportal4.ppt
050908 105407 fetching
http://localhost:8080/search_sample/kmportal6.ppt
050908 105407 fetching
http://localhost:8080/search_sample/testexcel.xls
050908 105407 fetching
http://localhost:8080/search_sample/javaCertStudyNotes.pdf
050908 105407 fetching
http://localhost:8080/search_sample/kmportal7.ppt
050908 105408 fetching
http://localhost:8080/search_sample/testdoc.doc
050908 105408 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal3.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105408 fetching
http://localhost:8080/search_sample/kmportal8.ppt
050908 105409 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal8.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105409 fetching
http://localhost:8080/search_sample/kmportal9.ppt
050908 105410 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal9.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105410 fetching
http://localhost:8080/search_sample/kmportal11.ppt
050908 105411 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal10.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105411 fetching
http://localhost:8080/search_sample/kmportal5.ppt
050908 105412 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal11.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105413 fetching
http://localhost:8080/search_sample/kmportal1.ppt
050908 105413 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal1.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105415 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal5.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105416 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal2.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105417 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal4.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105418 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal7.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint



thanks,
Ayyanar...

--- Jérôme Charron <jerome.charron@gmail.com> wrote:

> > 3. implement a catch-all plugin, which is
> equivalent to a Unix command
> > strings(1) (I have an implementation of that which
> I can contribute).
> > And turn it off/on in the config, if it's off,
> then the unknown content
> > is skipped and logged, if it's on - then make the
> best effort to extract
> > text.
> 
> Andrzej, I really like this solution... +1
> In such a case, other parse-plugin doesn't need
> anymore to check the 
> content-type: if they get some content, they assume
> it is of the good 
> content-type.
> 
> Regards
> 
> Jérôme
> 
> 
> -- 
> http://motrech.free.fr/
> http://www.frutch.org/
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Mime
View raw message