lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chakra Yadavalli <chakra.yadava...@gmail.com>
Subject How do you make "protected content" searchable by Google?
Date Thu, 17 Mar 2005 04:43:52 GMT
Hello, I am not sure if this is the right question for this list but
it is in regards to search engines.

Suppose you have a website that hosts some protected content that is
accessible via registered users. How you make the content searchable
by Google and other popular websearch engines? The idea is not to
reveal the conent even via the "Google cache."

Here is what I am thinking... 
Using Lucene (or its derivatives), skim thru the "protected content"
and remove all the common stop words , stem the words and place the
resulting text files in a directory availabe for the search bots (via
robotstxt rules). That way, even if the content is cached by the
search engines, it does not make much sense to humans but it still
will enable them to search it. When they click on the link to the
skimmed files, we need to redirect them to the login/registe page and
upon successful login, they should  be redirected to the actual human
readable/understandable page that corresponds to that has the "skimmed
content." Note that the "protected content" may be living in a Content
Management System or a database.

Am I overthinking/engineering it? Any ideas are really appreciated.

Thanks in advance,
Chakra
-- 
Visit my weblog: http://www.jroller.com/page/cyblogue

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message