incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Schenk <>
Subject Re: investigation using Google Webmaster tools
Date Wed, 01 Aug 2012 23:45:43 GMT

On 08/01/2012 04:29 PM, Rob Weir wrote:
> On Wed, Aug 1, 2012 at 7:06 PM, Kay Schenk <> wrote:
>> Hello all --
>> I am exploring the using the Google Webmaster tool that
>> Rob told us about on Jul 19.
>> I am ONLY getting started by looking at the 62,962 404 errors (!!!!!)
>> Many of these are links to VERY old docs which we no longer have -- like
>> source trees for 1.0.1, 1.0.2 etc.--  or have to do with the OLD
>> architecture -- servlet references etc.
> If I understand this correctly, Google is looking at links on
> webpages, not just our webpages, but also links from 3rd party
> websites, and if they point to an page that doesn't
> exist, it shows up on this list.   This could happen for any reason.
> In some cases the original link might have had a typo.

yes, this is correct, and you are right about this too...some of the 
404s reference pages we probably NEVER had.

>> Some of this issues could be solved with rather extensive use of sym links
>> (yes, you can actually use these in svn -- kind of) and of course some not
>> -- many missing old security bulletins.
> For the security bulletins, I wonder if this is actually a redirection
> error.  We have many of them here:

ah...yes, they are there...the problem is we would need to construct a 
LOT of just "redirect" pages to right some of these since they all seem 
to have the form


> But we're redirecting to
> So if there are outstanding URL's that are of the form
> then they might be broken now.

see's the actual placement of the bulletins within the tree 
that's the problem I think

>> So, to those of you using this tool, I may mark many of these as "fixed".
>> Of course they are not -- and they may show up again. Some of them only
>> show up in BZ issues!! (Google is amazingly thorough).
>> I don't know how long it will take for them to "show up" again. The problem
>> is some of these are very very very old references, and not likely we can
>> do anything about at this point in time.
>> If you're not using this tool, you probably don't care about this. If you
>> are using it, and have another opinion before I start chunking away at
>> hiding these, please weigh in.
> The way I understand it the links at the top of the list are the ones
> Google considers the most important.  I think this is based on the
> number of links to that page.  Maybe they factor in other things as
> well.  So I'd recommend looking more at the top 100 or so broken
> links, make this a manageable task.

Well the problem is "how" to make it manageable... :(

> Or -- and here is a challenge for the algorithm experts -- maybe there
> is an easy way to take that entire list of 62,962 links and determine
> what the top base paths are that are broken.

if only this were so :( They're all over the place.

  In other words, if the
> links are:
> Then this would tell us that* was a top source
> of broken links.  This might indicate important patterns of where the
> most broken links are.
> It seems like this could be done via a prefix tree (a "trie"):
> Maybe other (simpler) ways as well.

I'll look at this article. It's a daunting task any way you look at it.

> Regards,

What happens when things get moved a LOT with no regard for the end 
user. Don't get me started on the ways I've had to deal with this in the 

> -Rob
>> --
>> ----------------------------------------------------------------------------------------
>> MzK
>> "I'm just a normal jerk who happens to make music.
>>   As long as my brain and fingers work, I'm cool."
>>                                -- Eddie Van Halen


"I'm just a normal jerk who happens to make music.
  As long as my brain and fingers work, I'm cool."
                               -- Eddie Van Halen

View raw message