lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chakra Yadavalli <chakra.yadava...@gmail.com>
Subject Re: How do you make "protected content" searchable by Google?
Date Thu, 17 Mar 2005 14:06:02 GMT
I think this scheme is not misleading any users. We are not putting
"meaningless" keywords into the pages to get the page rank. That would
be against google's policy. correct?


On Thu, 17 Mar 2005 07:55:31 -0500, Kevin L. Cobb
<kevin.cobb@emergint.com> wrote:
> I worked on a website that had the same issue. We made a "search engine"
> page that listed all the documents that we wanted to index as links to
> documents that contained summaries of those documents with links to the
> entire document on the limited access site - Google won't be able to
> follow these links because they require use sign on, but the link will
> be there for enticement when Googlers find the page.
> 
> Love the idea about retrieving the stop words and stemming them for
> google indexing. But, Google is pretty picky I am told, so I would not
> be surprised if they detected this sort of scheme and decided not to
> index your pages.
> 
> Good luck.
> 
> KLCobb
> 
> -----Original Message-----
> From: Chakra Yadavalli [mailto:chakra.yadavalli@gmail.com]
> Sent: Wednesday, March 16, 2005 11:44 PM
> To: java-user@lucene.apache.org
> Subject: How do you make "protected content" searchable by Google?
> 
> Hello, I am not sure if this is the right question for this list but
> it is in regards to search engines.
> 
> Suppose you have a website that hosts some protected content that is
> accessible via registered users. How you make the content searchable
> by Google and other popular websearch engines? The idea is not to
> reveal the conent even via the "Google cache."
> 
> Here is what I am thinking...
> Using Lucene (or its derivatives), skim thru the "protected content"
> and remove all the common stop words , stem the words and place the
> resulting text files in a directory availabe for the search bots (via
> robotstxt rules). That way, even if the content is cached by the
> search engines, it does not make much sense to humans but it still
> will enable them to search it. When they click on the link to the
> skimmed files, we need to redirect them to the login/registe page and
> upon successful login, they should  be redirected to the actual human
> readable/understandable page that corresponds to that has the "skimmed
> content." Note that the "protected content" may be living in a Content
> Management System or a database.
> 
> Am I overthinking/engineering it? Any ideas are really appreciated.
> 
> Thanks in advance,
> Chakra
> --
> Visit my weblog: http://www.jroller.com/page/cyblogue
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


-- 
Visit my weblog: http://www.jroller.com/page/cyblogue

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message