tika-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris <SpamO...@freenet.de>
Subject Index only pages, that match a certain rex exp
Date Mon, 24 Jan 2011 03:57:12 GMT
Hello everyone,

I'm wondering about Solr/Nutch that uses Tika.
As far as I found out, I'm correct here with my need:

I'd like to index a bunch of webs (like 100 or so).
But *only* index a webpage if it contains a certain word (or better: a 
certain regular expression).
Is it possible via a custom parser?
And where and how do I put/deploy the parser?

Thank you in advance
Bye, Chris

View raw message