incubator-ooo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kay Schenk <kay.sch...@gmail.com>
Subject Re: investigation using Google Webmaster tools
Date Wed, 01 Aug 2012 23:45:43 GMT


On 08/01/2012 04:29 PM, Rob Weir wrote:
> On Wed, Aug 1, 2012 at 7:06 PM, Kay Schenk <kay.schenk@gmail.com> wrote:
>> Hello all --
>>
>> I am exploring the www.openoffice.site using the Google Webmaster tool that
>> Rob told us about on Jul 19.
>>
>> I am ONLY getting started by looking at the 62,962 404 errors (!!!!!)
>>
>> Many of these are links to VERY old docs which we no longer have -- like
>> source trees for 1.0.1, 1.0.2 etc.--  or have to do with the OLD
>> architecture -- servlet references etc.
>>
>
> If I understand this correctly, Google is looking at links on
> webpages, not just our webpages, but also links from 3rd party
> websites, and if they point to an openoffice.org page that doesn't
> exist, it shows up on this list.   This could happen for any reason.
> In some cases the original link might have had a typo.

yes, this is correct, and you are right about this too...some of the 
404s reference pages we probably NEVER had.

>
>> Some of this issues could be solved with rather extensive use of sym links
>> (yes, you can actually use these in svn -- kind of) and of course some not
>> -- many missing old security bulletins.
>>
>
> For the security bulletins, I wonder if this is actually a redirection
> error.  We have many of them here:
>
> http://www.openoffice.org/security/bulletin.html

ah...yes, they are there...the problem is we would need to construct a 
LOT of just "redirect" pages to right some of these since they all seem 
to have the form

"/security/cvs-bulletin-number".html

>
> But we're redirecting security.openoffice.org to
> http://incubator.apache.org/openofficeorg/security.html
>
> So if there are outstanding URL's that are of the form
> security.openoffice.org/foo.html then they might be broken now.

see above...it's the actual placement of the bulletins within the tree 
that's the problem I think


>
>> So, to those of you using this tool, I may mark many of these as "fixed".
>> Of course they are not -- and they may show up again. Some of them only
>> show up in BZ issues!! (Google is amazingly thorough).
>>
>> I don't know how long it will take for them to "show up" again. The problem
>> is some of these are very very very old references, and not likely we can
>> do anything about at this point in time.
>> If you're not using this tool, you probably don't care about this. If you
>> are using it, and have another opinion before I start chunking away at
>> hiding these, please weigh in.
>>
>
> The way I understand it the links at the top of the list are the ones
> Google considers the most important.  I think this is based on the
> number of links to that page.  Maybe they factor in other things as
> well.  So I'd recommend looking more at the top 100 or so broken
> links, make this a manageable task.

Well the problem is "how" to make it manageable... :(

>
> Or -- and here is a challenge for the algorithm experts -- maybe there
> is an easy way to take that entire list of 62,962 links and determine
> what the top base paths are that are broken.

if only this were so :( They're all over the place.

  In other words, if the
> links are:
>
> foo.openoffice.org/bar/baz1
> foo.openoffice.org/bar/baz2
> foo.openoffice.org/bar/baz2
> foo.openoffice.org/bar2/baz1
> foo2.openoffice.org/bar1/baz1
>
> Then this would tell us that foo.openoffice.org/bar/* was a top source
> of broken links.  This might indicate important patterns of where the
> most broken links are.
>
> It seems like this could be done via a prefix tree (a "trie"):
> http://en.wikipedia.org/wiki/Trie
>
> Maybe other (simpler) ways as well.

I'll look at this article. It's a daunting task any way you look at it.

>
> Regards,

What happens when things get moved a LOT with no regard for the end 
user. Don't get me started on the ways I've had to deal with this in the 
past.

>
> -Rob
>
>>
>>
>> --
>> ----------------------------------------------------------------------------------------
>> MzK
>>
>> "I'm just a normal jerk who happens to make music.
>>   As long as my brain and fingers work, I'm cool."
>>                                -- Eddie Van Halen

-- 
------------------------------------------------------------------------
MzK

"I'm just a normal jerk who happens to make music.
  As long as my brain and fingers work, I'm cool."
                               -- Eddie Van Halen



Mime
View raw message