incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <dave2w...@comcast.net>
Subject Re: investigation using Google Webmaster tools
Date Thu, 02 Aug 2012 00:03:03 GMT
Sorry to top post, but this week I am at my work HQ and am busy.

I think that we should create a 404 page and then ask infra to point to that.

Sent from my iPhone

On Aug 1, 2012, at 7:45 PM, Kay Schenk <kay.schenk@gmail.com> wrote:

> 
> 
> On 08/01/2012 04:29 PM, Rob Weir wrote:
>> On Wed, Aug 1, 2012 at 7:06 PM, Kay Schenk <kay.schenk@gmail.com> wrote:
>>> Hello all --
>>> 
>>> I am exploring the www.openoffice.site using the Google Webmaster tool that
>>> Rob told us about on Jul 19.
>>> 
>>> I am ONLY getting started by looking at the 62,962 404 errors (!!!!!)
>>> 
>>> Many of these are links to VERY old docs which we no longer have -- like
>>> source trees for 1.0.1, 1.0.2 etc.--  or have to do with the OLD
>>> architecture -- servlet references etc.
>>> 
>> 
>> If I understand this correctly, Google is looking at links on
>> webpages, not just our webpages, but also links from 3rd party
>> websites, and if they point to an openoffice.org page that doesn't
>> exist, it shows up on this list.   This could happen for any reason.
>> In some cases the original link might have had a typo.
> 
> yes, this is correct, and you are right about this too...some of the 404s reference pages
we probably NEVER had.
> 
>> 
>>> Some of this issues could be solved with rather extensive use of sym links
>>> (yes, you can actually use these in svn -- kind of) and of course some not
>>> -- many missing old security bulletins.
>>> 
>> 
>> For the security bulletins, I wonder if this is actually a redirection
>> error.  We have many of them here:
>> 
>> http://www.openoffice.org/security/bulletin.html
> 
> ah...yes, they are there...the problem is we would need to construct a LOT of just "redirect"
pages to right some of these since they all seem to have the form
> 
> "/security/cvs-bulletin-number".html
> 
>> 
>> But we're redirecting security.openoffice.org to
>> http://incubator.apache.org/openofficeorg/security.html
>> 
>> So if there are outstanding URL's that are of the form
>> security.openoffice.org/foo.html then they might be broken now.
> 
> see above...it's the actual placement of the bulletins within the tree that's the problem
I think
> 
> 
>> 
>>> So, to those of you using this tool, I may mark many of these as "fixed".
>>> Of course they are not -- and they may show up again. Some of them only
>>> show up in BZ issues!! (Google is amazingly thorough).
>>> 
>>> I don't know how long it will take for them to "show up" again. The problem
>>> is some of these are very very very old references, and not likely we can
>>> do anything about at this point in time.
>>> If you're not using this tool, you probably don't care about this. If you
>>> are using it, and have another opinion before I start chunking away at
>>> hiding these, please weigh in.
>>> 
>> 
>> The way I understand it the links at the top of the list are the ones
>> Google considers the most important.  I think this is based on the
>> number of links to that page.  Maybe they factor in other things as
>> well.  So I'd recommend looking more at the top 100 or so broken
>> links, make this a manageable task.
> 
> Well the problem is "how" to make it manageable... :(
> 
>> 
>> Or -- and here is a challenge for the algorithm experts -- maybe there
>> is an easy way to take that entire list of 62,962 links and determine
>> what the top base paths are that are broken.
> 
> if only this were so :( They're all over the place.
> 
> In other words, if the
>> links are:
>> 
>> foo.openoffice.org/bar/baz1
>> foo.openoffice.org/bar/baz2
>> foo.openoffice.org/bar/baz2
>> foo.openoffice.org/bar2/baz1
>> foo2.openoffice.org/bar1/baz1
>> 
>> Then this would tell us that foo.openoffice.org/bar/* was a top source
>> of broken links.  This might indicate important patterns of where the
>> most broken links are.
>> 
>> It seems like this could be done via a prefix tree (a "trie"):
>> http://en.wikipedia.org/wiki/Trie
>> 
>> Maybe other (simpler) ways as well.
> 
> I'll look at this article. It's a daunting task any way you look at it.
> 
>> 
>> Regards,
> 
> What happens when things get moved a LOT with no regard for the end user. Don't get me
started on the ways I've had to deal with this in the past.
> 
>> 
>> -Rob
>> 
>>> 
>>> 
>>> --
>>> ----------------------------------------------------------------------------------------
>>> MzK
>>> 
>>> "I'm just a normal jerk who happens to make music.
>>>  As long as my brain and fingers work, I'm cool."
>>>                               -- Eddie Van Halen
> 
> -- 
> ------------------------------------------------------------------------
> MzK
> 
> "I'm just a normal jerk who happens to make music.
> As long as my brain and fingers work, I'm cool."
>                              -- Eddie Van Halen
> 
> 

Mime
View raw message