Sorry to top post, but this week I am at my work HQ and am busy.
I think that we should create a 404 page and then ask infra to point to that.
Sent from my iPhone
On Aug 1, 2012, at 7:45 PM, Kay Schenk <kay.schenk@gmail.com> wrote:
>
>
> On 08/01/2012 04:29 PM, Rob Weir wrote:
>> On Wed, Aug 1, 2012 at 7:06 PM, Kay Schenk <kay.schenk@gmail.com> wrote:
>>> Hello all --
>>>
>>> I am exploring the www.openoffice.site using the Google Webmaster tool that
>>> Rob told us about on Jul 19.
>>>
>>> I am ONLY getting started by looking at the 62,962 404 errors (!!!!!)
>>>
>>> Many of these are links to VERY old docs which we no longer have -- like
>>> source trees for 1.0.1, 1.0.2 etc.-- or have to do with the OLD
>>> architecture -- servlet references etc.
>>>
>>
>> If I understand this correctly, Google is looking at links on
>> webpages, not just our webpages, but also links from 3rd party
>> websites, and if they point to an openoffice.org page that doesn't
>> exist, it shows up on this list. This could happen for any reason.
>> In some cases the original link might have had a typo.
>
> yes, this is correct, and you are right about this too...some of the 404s reference pages
we probably NEVER had.
>
>>
>>> Some of this issues could be solved with rather extensive use of sym links
>>> (yes, you can actually use these in svn -- kind of) and of course some not
>>> -- many missing old security bulletins.
>>>
>>
>> For the security bulletins, I wonder if this is actually a redirection
>> error. We have many of them here:
>>
>> http://www.openoffice.org/security/bulletin.html
>
> ah...yes, they are there...the problem is we would need to construct a LOT of just "redirect"
pages to right some of these since they all seem to have the form
>
> "/security/cvs-bulletin-number".html
>
>>
>> But we're redirecting security.openoffice.org to
>> http://incubator.apache.org/openofficeorg/security.html
>>
>> So if there are outstanding URL's that are of the form
>> security.openoffice.org/foo.html then they might be broken now.
>
> see above...it's the actual placement of the bulletins within the tree that's the problem
I think
>
>
>>
>>> So, to those of you using this tool, I may mark many of these as "fixed".
>>> Of course they are not -- and they may show up again. Some of them only
>>> show up in BZ issues!! (Google is amazingly thorough).
>>>
>>> I don't know how long it will take for them to "show up" again. The problem
>>> is some of these are very very very old references, and not likely we can
>>> do anything about at this point in time.
>>> If you're not using this tool, you probably don't care about this. If you
>>> are using it, and have another opinion before I start chunking away at
>>> hiding these, please weigh in.
>>>
>>
>> The way I understand it the links at the top of the list are the ones
>> Google considers the most important. I think this is based on the
>> number of links to that page. Maybe they factor in other things as
>> well. So I'd recommend looking more at the top 100 or so broken
>> links, make this a manageable task.
>
> Well the problem is "how" to make it manageable... :(
>
>>
>> Or -- and here is a challenge for the algorithm experts -- maybe there
>> is an easy way to take that entire list of 62,962 links and determine
>> what the top base paths are that are broken.
>
> if only this were so :( They're all over the place.
>
> In other words, if the
>> links are:
>>
>> foo.openoffice.org/bar/baz1
>> foo.openoffice.org/bar/baz2
>> foo.openoffice.org/bar/baz2
>> foo.openoffice.org/bar2/baz1
>> foo2.openoffice.org/bar1/baz1
>>
>> Then this would tell us that foo.openoffice.org/bar/* was a top source
>> of broken links. This might indicate important patterns of where the
>> most broken links are.
>>
>> It seems like this could be done via a prefix tree (a "trie"):
>> http://en.wikipedia.org/wiki/Trie
>>
>> Maybe other (simpler) ways as well.
>
> I'll look at this article. It's a daunting task any way you look at it.
>
>>
>> Regards,
>
> What happens when things get moved a LOT with no regard for the end user. Don't get me
started on the ways I've had to deal with this in the past.
>
>>
>> -Rob
>>
>>>
>>>
>>> --
>>> ----------------------------------------------------------------------------------------
>>> MzK
>>>
>>> "I'm just a normal jerk who happens to make music.
>>> As long as my brain and fingers work, I'm cool."
>>> -- Eddie Van Halen
>
> --
> ------------------------------------------------------------------------
> MzK
>
> "I'm just a normal jerk who happens to make music.
> As long as my brain and fingers work, I'm cool."
> -- Eddie Van Halen
>
>
|