Return-Path: X-Original-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-ooo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6F148C7E for ; Wed, 7 Sep 2011 02:55:32 +0000 (UTC) Received: (qmail 53497 invoked by uid 500); 7 Sep 2011 02:55:32 -0000 Delivered-To: apmail-incubator-ooo-dev-archive@incubator.apache.org Received: (qmail 53443 invoked by uid 500); 7 Sep 2011 02:55:29 -0000 Mailing-List: contact ooo-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: ooo-dev@incubator.apache.org Delivered-To: mailing list ooo-dev@incubator.apache.org Received: (qmail 53435 invoked by uid 99); 7 Sep 2011 02:55:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2011 02:55:28 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dennis.hamilton@acm.org designates 75.98.160.130 as permitted sender) Received: from [75.98.160.130] (HELO a2s15.a2hosting.com) (75.98.160.130) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Sep 2011 02:55:19 +0000 Received: from 63-226-210-225.tukw.qwest.net ([63.226.210.225] helo=Astraendo) by a2s15.a2hosting.com with esmtpa (Exim 4.69) (envelope-from ) id 1R18Hx-00026x-5C; Tue, 06 Sep 2011 22:54:58 -0400 Reply-To: From: "Dennis E. Hamilton" To: Cc: "'Kay Schenk'" , "'Matt Richards'" Subject: [wiki] Migration - A TerryE Clipping Collection [LONG] Date: Tue, 6 Sep 2011 19:56:10 -0700 Organization: NuovoDoc Message-ID: <02ff01cc6d09$b352c0e0$19f842a0$@acm.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AcxtCWwmPsmeDnVoSr226UuGWu8wXQ== Content-Language: en-us X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - a2s15.a2hosting.com X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - acm.org X-Virus-Checked: Checked by ClamAV on apache.org I compiled a LONG chronological clipping set of notes from Terry on from = conception to when he stopped posting on the issues and thinking around = Wiki migration. =20 I'm breaking a thread here. There is no need to comment on any of this, = it is for reference and mining for things to not overlook, collate onto = the wiki, etc.=20 The selections are mine, the list is long, and it probably works better = distilled onto a wiki. This is the collection phase. I am staying as close to the architectural and technical considerations = that I can, and not follow into areas where there were policy spats of = one kind or another. I thought I might have to look at others posts = too, but generally, I could find enough in what Terry was responding to = to provide continuity. It is interesting that we are now re-hashing some of the things that = Terry covered at least one starting over a month ago. NOTE: Terry also provided OOOUSER Material *** 2011-07-31-13:53 All times PDT (UTC-0700), TerryE - Wiki Size The OOo wiki contains 10,521 content pages and =20 11.338 uploaded files. These form a critical service to the end-user=20 community. Note that the cwiki markup format doesn't support many of=20 the MediaWiki specific markups in this content. *** 2011-07-31-14:36 TerryE - Concerns for Conversion to CWiki [T]he challenge here is that the cwiki markup=20 grammer is different to that of mediawiki and largely a functional=20 subset. Worse, mediawiki supports extensions to allow you to extend=20 this. The OOo implementation currently uses *42* such extensions. One=20 of these was custom-developed by the German OOo team. With over 10,000=20 pages of content, sentencing this lot and even if we establish a=20 migration ruleset to map 90% of the non-conformities, we still looking=20 at a MAJOR project. *** 2011-07-31-15:07 Terry Ellison - Formulating the Hardware/Software = Stack Both the OOo forums and the OOo wiki can run on a standard LAMP stack. =20 [ ... ] The wiki is a bit more of a hog and need 4-cores, but it currently=20 doesn't use a PHP opcode cache and this would halve this load. Most of=20 the access is guest, so using a squid or varnish front-end will drop=20 this significantly. The simplest way to provide this service would be to use VMs and AFAIK=20 this is a model that the a.o infrastructure guys understand. *** 2011-07-31-16:02 TerryE - Migrating the Wiki, Concerns not to = Overlook As far as the wiki goes, this template has been more heavily customised=20 again using standard MediaWiki hooks and facilities. Switching logos=20 and banners would be reasonably straight forward, but alas the guys that = did the bulk of the content management are history. > (3) Legal - IP, Copyright and Licensing > We will need to make sure that we have clear documentation of the = copyright and license for the content on the wiki and user forums. We'll = need to make sure that the Migration does not infringe on anyone's = copyright and/or inadvertently change any licenses. If we are clear = about what we are planning for these sites on an openoffice.org domain = then it *might* be possible to take everything. The forum content was all under the Sun then Oracle terms which roughly=20 mirror CCA but with OOo retaining full rights to use. All bar a few=20 dozen of the wiki pages likewise. =20 *** 2011-08-01-07:42 Terry Ellison - Server Capacity Considerations, = Confirming Operation > [C]an you give me some Ubuntu VM specs so I can create them for the = Forums > and the Wiki -- we can then get them setup and working as a test and = ready for a final dump/load > when we switch over, in the meantime we can check the loads of the VM = hosts etc to be sure > all will be ok. > > Just amount of RAM and disk space required should be all I need - and = 1 or 2 cpu , then I'll get an > Ubuntu VM up for each. We try and stick to LTS releases so will get it = to 10.04.3 LTS version unless > you have a reason we should use a later Ubuntu release. I track Ubuntu current for my laptop and home server, and use Ubuntu LTS = by preference for my LAMP VMs. I use a 2 disk split with a common=20 immutable system image and an app-specific /var (with callback hooks in=20 the startup so the app can tailor the system). I've just rebaseline my=20 VMs from 10.04-1 LTS to 10.03-1 LTS. If you guys have already developed = an Ubuntu VM template then I can always pick up that. If not then I'll=20 write up my template and make sure the licensing of my IPR / content is=20 OK, so you have the option to reuse it. I would prefer to stick with Ubuntu VMs for now because I like to keep=20 local mirrors of any prod system for release development/rehearsal and=20 hunting down live problems. I currently use VirtualBox, but I can=20 reinstall VMserver on my local server so that I can switch my VMs from=20 VBox Guest Utils to VMware Tools. (I like to exactly mirror any prod=20 systems locally.) Installing and getting to grips with the bowels of FreeBSD server is=20 just another learning curve that I would like to avoid at the moment. =20 I've already got a learning curve on compliance with your own=20 infrastructure standards on Identification & Access Control, Backup,=20 Logging, Intrusion Detection, Status Report, .... *** 2011-08-01-09:53 TerryE - More on Confluence Migration *I* have looked at [migration to Confluence]=20 and it *will* be hard. I didn't say impossible. =20 ... Just look at * http://wiki.services.openoffice.org/wiki/Special:Statistics * http://wiki.services.openoffice.org/wiki/Special:Version for the volumetrics and MediaWiki extensions used. If you google=20 "mediawiki confluence migration" then you will see that isn't a trivial=20 exercise even for wiki using standard MW with no extensions: it would=20 involve person-years of effort. I have suggested ... rehosting but on a = sustain basis for the wiki. I can set all of this up, if agreeable to=20 the project. It doesn't need further material resources from the Apache = team. This would at least buy us the time to do a proper migration plan = and resource it. *** 2011-08-01-20:03 TerryE - Working to Have Confirmable = Configurations, Trial Operation > I'd like to get going so am going to create 2 x VMs of 50GB space and = 2GB RAM each, both > with Ubuntu 10.04.3. One for the mediawiki and one for the forums. The wiki might struggle=20 without tuning, since the current config isn't and the MediaWiki engine=20 is a D/B hog unless you stick a cache such as Squid or Varnish in front=20 of it. ... I'll want to mirror=20 them exactly for my dev, so can you mail me the dpkg -l listing together = with the etc tarball and details of any not debian installs. I'll shout if there's anything missing. We'll need a PHP accelerator -- = Xcache or APC, and php5-cli. *** 2011-08-03-08:07 Terry Ellison - Reaching the Wiki Users, Legal = Concerns > [...] Is there a announcement list or some other mechanism > to send an email to every registered wiki user? At a technical level, it's simple to run a query dumping all of the mail = addresses of contributors to the wiki. I've just done a few on my local = VM which has a snapshot of the prod wiki as at Thursday/Fri night IIRC. * There are 34,969 registered users. Of which * 3,675 have made contributions. There is no need to contact those who haven't * 3,623 have registered email addresses and have made 182,677 contributions * 52 have no registered email addresses and have made 153 contributions (prob dating back to the early days when email registration and confirmation wasn't mandatory It is trivial to dump this list of user / email addr / post count. However giving this to Apache and the project making use of it is a more = complex issue. The server is current located in Oracle's Hamburg=20 facility under German / EU legislation. We have data protection=20 legislation and Anti-Spam guidelines / legislation to bear in mind=20 here. Moving email addresses across national and organisational=20 boundary might trigger these. Also one can't send out mailshot emails=20 in the EU unless the recipients have first agreed in principle to accept = these. What I can do is to provide this data to Andrew via the internal Oracle=20 email, and let him figure out the legal / compliance issues and terms of = use before him making it available to the project. ... *** 2011-08-04-01:36 Terry Ellison - Moving the WIki "As-Is" for Now I am working with the rest of the project to=20 migrate the forums and the wiki "as-is" for now. . As far as the wiki goes in the medium to longer term, the project may = decide to move some material to cwiki, but this is work in progress. *** 2011-08-04-02:05 Terry E - Registering and Authorizing Users and = Their Edits > 1) People must ask for an account; they can't self-subscribe. Nothing = is > required except a few words about who you are and why you want an > account. Any one of several people authorised to approve or reject = these > requests can deal with them expeditiously. Very few spammers, in my > experience, take the trouble to actually request accounts. We need to implement this in a way which sits within MediaWiki=20 functionality and complies with the goals. One way would be * to allow the normal self-registration and optional email address with email verification * and have a new wiki role, say "contributor" (or is this contributer in US-speak?). * guest have no write access * registered users can write to User and User_talk namespaces but to no others * registered users can request to become a Contributor, but the must have completed their User page, verified their email address and confirmed that all future edits to the Main or Talk namespaces are made under licence (CCA AL2 or whatever we decide. * the granting of Contributor is done by the bureaucrats. * The Main and Talk pages contain "reference" content. * There is a standard disclaimer that user/user talk is user content is user content * We would still need main and user namespace guidelines TOUs. This might seem a little convolved, but this can be configured with std=20 MW/extension functionality. > 2) Alternatively, or in addition, the first X edits/ contributions/ > whatever are moderated by a group of people, any one of whom can = approve > or reject the items. After X acceptable contributions, the person is > then allowed to edit the wiki without further supervision [ ... ] We could add another committer layer so that contributer (but not=20 committer) edits are moderated However, I suspect that a trust-but-verify attitude is easier for=20 everyone. When we catch contributers deliberately abusing the rules,=20 then we can always back out their changes and remove contributer=20 status. This is similar to our forum model and works well there. *** 2011-08-04-04:30 TerryE - Wiki Extension to Require = Review-then-Commit > You probably know more about this than I do, but my understanding is = that the current OOo wiki has an extension installed that does what I = was suggesting in option 2, but the extension has not been implemented. = See: > http://www.mediawiki.org/wiki/Extension:FlaggedRevs and specifically: > = http://www.mediawiki.org/wiki/Extension:FlaggedRevs#Automatic_user_promot= ion > Yes, you are correct. This extension can do this and more, but with=20 a grey issue like this I feel that a DL based dialogue isn't the best=20 way to work out what to do here. Better we work up a position=20 paper/page within the OOOUSERS cwiki laying down the options, their pros = and cons and then agree a consensus or vote either on the paper itself. = Use the DL to note the consensus and get wider feedback. What concerns me is the moderation load involved with such an active=20 intervention of review-before-publish. Perhaps others with moderator=20 experience might care to comment? My worry is that review-before-publish also ignores the reality of how=20 people edit wikis. In general they don't prepare and proof draft=20 offline then paste their best and final into the article. Most do it=20 section by section or end up correcting / rewording when they see the=20 final version, so one logical edit can comprise half a dozen posts. I=20 am not sure how this would work if you've got to wait for approval=20 before the next edit. *** 2011-08-05-15:18 Terry Ellison - Migration Systems Setting Up > Gav, the Admin from the ASF has setup the VM for the MediaWiki. It's a = > Ubuntu 10.4.3 LTS VM. At the Moment Gav and I has admin access to this = > machine. First we have to install all the needed software, Then we=20 > will make a test migration with a older Dump, then we can make=20 > testings and finaly the final migration. > > Greetings Raphael Raphael, great news. I've my dev shadow VM up as well, and just logged = onto minotaur for the first time with my new apache ID. I'll ping the=20 config detail over to you and Gavin and take it from there. Regards = Terry *** 2011-08-07-10:49 TerryE - Questions About the Wiki and Customizing = Pages > It should be possible to alter the login page to show any important=20 > information we want to share about the migrated site, correct? With=20 > possibly a survey? > > Also, what is the purpose of the Bots group? The WikiEditor group is the default group for a MediaWiki extension,=20 FlaggedRevisions, which facilitates some of the control functionality=20 that Rob, you and other have been asking for. If you look at=20 Clayton's status page, this extension isn't yet put to use. A job for=20 post transition, I think. But that's why Clayton is the only current=20 member. Yes, the Main_Page is a restricted page, but anyone can take a copy=20 and work on it in their User page hierarchy. When we have a consensus=20 that it's OK then I can move it back into main. The bots group is for a function supported by MediaWiki where it=20 provides an [XMLRPC] to all batch process to work on the wiki. In = this=20 case Clayton and co used the PyWikipediabot (I think) to carry out=20 bulk transactions. This needs an account with special privileges. *** 2011-08-07-15:38 Terry Ellison - WORK ITEMS AND ISSUES - FIRST CUT I've just finished a 1st cut of outstanding tasks and issues for the = Wiki. See=20 https://cwiki.apache.org/confluence/display/OOOUSERS/Community+Wiki+Servi= ces=20 Comments gratefully received on this DL and/or on the page itself. *** 2011-08-08-02:22 Terry Ellison - Derisking Migration/Staging - = Separating Infrastructure/Platform and Content Thank-you all for your replies so far. I will work through and respond=20 to individual points in thread at the appropriate branch, but here I=20 just wanted to propose that we divide the Wiki migration into two=20 separate but task areas: * Migration of the infrastructure service. What drives this is if Oracle decides, say, that the current hardware is being turned off on the XXth of Aug, then we need to move the service to a box in Apache.org. The issues here are largely technical heavily involve the infrastructure team. We need to be able to move quickly on this and few of the DL contributors have strong opinions here.=20 They just want the 'magic to happen'. Yes, there are a few decisions to be made, but the questions are a little esoteric for most DL members. I offer to continue to lead this work area. * Migration of the wiki content / policy. Here we have a range of widely and strongly held views and the DL seems to be acting as a debating shop without convergence to consensus on some points.=20 This debate could still be continuing in 3 months time as far as I can see. I don't want it to get in the way of infrastructure migration -- except when /absolutely/ necessary. It also makes sense to break this task area further if this means that we can make progress in some areas, even if stalled in others. but from an infrastructure=20 viewpoint I don't really care this branding process can be encapsulated=20 and detached. * We make the pre-prod wiki available to the "branding team" * The branding team agree and implement the branding changes that the project wants to have in place for cut-over day. In practice this could involve the creation or modification of content in the Main, File, MediaWiki, OpenOffice.org Wiki and Template Namespaces * I can use standard MW audit functionality to capture a list of all such pages, and then standard MW export functionality to create an XML.GZ dump of this work. * We then import the "live" wiki as part of cut-over. * I then reimport the above XML.GZ to reapply the branding changes to the migrated wiki. ... [M]y suggestion is to keep the infrastructure and content issues=20 separate. I can lead on the first. We need someone who can lead on the = second. His or her strengths should be that he/she has experience of=20 delivery -- that is can make things happen -- and has a good working=20 knowledge of more advance MW features such as templates, etc. I would=20 think that either Drew or TJ would be well qualified, but that is only=20 my personal suggestion. *** 2011-08-08-01:41 Terry Ellison - Getting Help on the Content Side > I can submit an iCLA if it would help (it's on my WIGATI list). [W]e need someone has sound MediaWiki skills to lead on doing these=20 [content] I've suggest that you are=20 qualified. I don't know if anyone on the DL other than you or I knows=20 enough about MW content editing, so could you consider my suggestion? *** 2011-08-08-02:39 TerryE - More on Content Migration > I assuming that under the name "branding changes" we include: > > 1) OpenOffice.org -> Apache OpenOffice.org and associated logo = changes > > 2) Removal of Oracle logos and name > > 3) Replacement of privacy policy and disclaimer > > 4) Update of page edit license text to require Apache 2.0 for new = contributions Yes, good points, though such content changes are functionally separate=20 from the (infrastructure) migration. I have suggested in a separate=20 branch that we set up separate task area with its own leader to work=20 these issues. *** 2011-08-08-03:16 Terry Ellison - Wanting to Walk through It At Home To be honest I'm not comfortable with any solution unless=20 I've used myself, ... This {??] option is simple to=20 implement and try in pre-prod, unlike a migration to PostgreSQL which=20 will involve quite a bit of work. (An example of where I am comfortable is with the use of PHP-APC). What I had hoped to get feedback/from the wider ooo-dev DL/ was on the=20 least technical risk, but which also had the business impact, and that=20 is to stop the service for ~30 mins a day to do the backup. Is this acceptable? Historically, the lowest transaction rates occur at = ~04:00 UTC, so this is when I'd do it if we go this way. *** 2011-08-08-03:33 Terry Ellison - Risks of Changing DBMS along with = Migration There are two factors against an early move to PostgreSQL: (i) The=20 MediaWiki cavaets on its use (here=20 and as all experienced = Wikipedian's do look at the associated talk page here=20 ; this has a=20 couple of interesting references which cookbook the conversion). (ii)=20 The extra work involved. This is not only the D/B migration, but also a=20 MW version upgrade 15.1 -> 17. To be honest I am uncomfortable doing this as part of this immediate=20 "continuity of service" migration. My suggestion is that if the wiki=20 looks as if it is going to have a long term place within the project=20 then we should revisit this as part of an in-service improvement program = in, say, 6-12 months. *** 2011-08-08-04:14 Terry Ellison - Migrating Static Web Projects to = Wiki > I strongly suggest that all the native=20 > language projects front-facing pages be migrated to the "new" wiki. [ ... ] > This would of course, give them URLS of the form > somewikiname.openoffice.org/"nl_code" > rather than "nl_code".openoffice.org [ ... ] > Naturally, we should confirm acceptance with them for this. ... I see this as part of the content migration / transformation=20 task area. What I want to go is to get the current content over with=20 loss of content, formatting or change history. ... I assume by your refrence to the "new" wiki=20 ... you mean the ooo-dev cwiki. As we've discussed before, the current=20 migration from MediaWiki to Confluence can be highly lossy unless there=20 is per-page content editor intervention: we [lose] all change history,=20 some content formatting (1) and some even content (2). It therefore=20 makes sense, even if we migrate the publicly viewed copy to cwiki, to=20 keep a master copy of the old content online in ... this wiki, albeit=20 moving the page into an "Archive" namespace which is private and=20 read-only to committers. *** 2011-08-08-05:03 Terry Ellison - Migration Isn't Just Moving a = System Dump > If Oracle decides, say, that the current hardware is being turned off > on the XXth of Aug, then we need to grab a backup of the content to a=20 > box in Apache.org. It's a big more complicated than just grabbing the content :) I already = routine off-site the current forum and wiki system content for dev=20 support and [Disaster Recovery], but that's within current working=20 practices that date back to Sun days. It's putting content to new use=20 with a.o that requires clearing the hurdles. *** 2011-08-09-02:16 Terry Ellison - LOOKING AT WORK-ITEM ISSUES = FEEDBACK, STEPS BEING TAKEN [Re 2011-08-07-15:38, = https://cwiki.apache.org/confluence/display/OOOUSERS/Community+Wiki+Servi= ces] Thanks to everyone for this constructive feedback. I hope that I have=20 given replies in thread fork where you needed then, but I will up=20 integrate them and update the document and move forward to complete task = 6 in the next couple of days. There are two further points that I would = like to raise for comment / feedback. (1) This bulk of this page content relates to technical details and=20 issues, rather than content / branding / licensing, etc.. As I=20 mentioned in a reply to RobW, I would like to decouple the=20 infrastructure aspects from the content-relate aspects as much as=20 practical to keep the momentum on the platform move. I therefore=20 propose to move the bulk of the infrastructure content into a=20 supplementary page which focuses on this content. (2) I will be making the snapshot version accessible to the project,=20 however: * I will be overwriting all passwords (except mine and Drew's) thus disabling existing account use. I will also replace all email addresses by dummies to prevent existing users resetting their passwords, and user page-watch information to stop bad emails being generated. * Anyone who one who wishes to work on the wiki will need to create a new account and apply to Drew or myself for temporary elevated privileges if needed. Just remember that new content will be blown away at cut-over, so individual new users must use the standard export functionality prior to cut-over to save it if they want to restore it post live-transfer. * As step one removes the private data elements that could be viewed as falling within the purview of EU data protection legislation, this means that we could in principle make a copy of this D/B to any committers for data analysis. All they need is MySQL installed and 3Gb space the D/B. The data model is pretty straight forward :-) (*) (*) a quick example here is that the edit profile is a classic near=20 log-lin relationship: 33% of edits where made by 10 editors, 73% by 100, = 90% by 315. *** 2011-08-10 15:12 Terry Ellison TEST/PRE-PRODUCTION WIKI INSTANCE = PROGRESS Just a quick update on progress on the wiki instance. This is now up in = preprod mode on the subdomain ooo-wiki. This is based on a snapshot=20 taken at 4am yesterday morning. The server needs further tuning, but=20 functionally its there. We do have some outstanding code issues (due to = a change in the functionality of the PHP routine*call_user_func* between = Version 5.3 and 5.2 which is not backwards compatible. I'll sort this=20 out tomorrow). Also note as per my previous email, all existing account have had there=20 passwords mangled, so access is guest-mode only at the moment. Feel free = to have a look, but updates and no service commitments yet. *** 2011-08-11-01:18 Terry Ellison - Rig for Silent Running (robots.txt) >> >> > Hmmm... if this is just a temporary URL, I wonder if we should try to > exclude search engine indexing via robots.txt ? > > Otherwise, now that the URL is known, we're going to get spidered but > then all those links will soon be dead. > Thanks Rob for pointing this out, and for Raphael and Gavin fixing it. =20 This was incidentally on my bucket list for today. I didn't think it=20 that urgent as it would take a day or so for Google et al to acquire the = site, but you correct. Better sooner. *** 2011-08-11-01:38 Terry Ellison, BREAKAGE IN SEPARATING WIKI FROM = OO.O > I found one bug: > On the wiki left side lowest frame where store links for other=20 > languages main pages, > the links points to http://wiki.services.openoffice.org, not to the=20 > http://ooo-wiki.apache.org/. You are correct, and in fact there are a lot of references to=20 *.OpenOffice.org in the content. However, there isn't a production=20 instance and there little point in changing these until (1) we agree=20 what such "dirty" references are going to be changed to, and (2) we cut=20 over for real. Remember that any changes that we made on this instance=20 will be lost when we bring over the live [database]. 2011-08-12-04:53 Terry Ellison, OO.O WIKI ERROR LOGS BUMP As part of my takeover of SysAdmin on wiki.services.openoffice.org, I've = been reviewing the error logs and noticed that last week there's been=20 quite a rise in reported errors by clients trying to enumerate the wiki=20 incorrectly. This may just be a coincidence and nothing to do with our=20 work, but if anyone on this DL is trying to suck material down, there=20 are right ways to do this. I also understand the MediaWiki webAPI if=20 any of you want to write a bot to do this. ... Please contact me to = discuss. *** 2011-08-23-20:30 Terry Ellison, SPEEDING UP WIKI ACCESS, REDUCING = LOAD I've now finished the upgrade to add the Apache Traffic Server front end = to the community MediaWiki service at http://ooo-wiki.apache.org/ and=20 the service is back online. We need to do some further tuning of the system cache optimisation, but=20 even with the first-cut settings that I prepared on my own test-bed VM,=20 the system already looks as if it it is hitting the performance targets=20 needed to sustain a full production transaction rate. It certainly=20 feels extremely snappy compared to the existing OOo community wiki or=20 the Apache cwiki, and the Google pagespeed benchmarks are significantly=20 better than both of these systems. So: * We are good to go for production migration of the community wiki. * We are the first Apache project adopter of another Apache project, Apache Traffic Server (ATS) * We have laid the foundation for an ATS template for MediaWiki hosting that can be used to promote the use of ATS for the wider MediaWiki systems community. It's been a hard few days work, and still some finishing to do, but my=20 thanks to the Infrstructure and ATS guys that have supported me in=20 making this happen. *** 2011-08-25-08:59 Terry Ellison, REQUEST FOR BASELINE AGREEMENTS TO = MOVE AHEAD I can't execute any plan without a baseline requirement and set of=20 assumptions, so what this note attempts is to lay down such a set, and=20 the decisions that need to be made to go forward. So PLEASE, I don't=20 want any flames about my use of DECISION below. What I simply mean is=20 the if the PPMC as a body accepts these, then I will try my best to move = this work forward. Of course you are free to challenge / change any of=20 this if that is a PPMC voted decision, but in this case I need to move=20 into a different mode; to suspend work and stop the clock until we have=20 an PPMC-endorsed baseline to replan on. ... * The infrastructure stack is base on a standard Ubuntu server LAMP=20 stack as at current LTS (Ubuntu 10.04-3 LTS) which included PHP 5.3.2 * The prod wiki is v1.15.1 that at an N-3 major release level (that's = 30 months old: two major and 10 minor revisions behind the current=20 supported). This also runs on PHP 5.2.0. * We need an reverse-proxy HTTP cache for performance reasons on the=20 wiki. One of the four market leaders in this niche is another Apache=20 project: Apache Traffic Server (ATS). It makes sense to stay "in-house" = here for both support and referenceability reasons * *DECISION*: Adopt ATS v3.0.1 as the HTTP cache for the wiki. (BTW,=20 this work has been done and the product is excellent). The PHP 5.3 introduced extra checking to remove an area of tolerance the = PHP 5.x<3 allowed. This was to do with when and how parameters can be=20 passed by reference under curtain circumstances. So moving a code base=20 from 5.2 to 5.3 involved a lot of work identifying and eliminating this=20 mis-codings. This was done by the MediaWiki team in MW v1.16. I had=20 planned to move to MW v1.15.5 (the last stable 1.15.x) as our baseline=20 and I've done this work integrating it with Apache Traffic Server (ATS)=20 and our LAMP stack. This is stable and performant enough to show that=20 we are good. However, I have only identified and bug-fixed the main=20 path 5.2->5.3 coding issues. During my testing I have subsequently=20 discovered others and there are undoubtedly more to find. I've also=20 discussed this with the MW devs on the MW IRC channel. Given this, the=20 consensus in the @infra team (me included) is that we should bite the=20 bullet now and move to current MW 1.17.0 even given the extra work. =20 There are some performance risks associated with MW 1.17.0 which we need = to mitigate. However, given that we've already got a complete=20 LAMP+ATS+MV in an ESX hosted VM performing like a dingbat, we really=20 only face the 1.15.5 -> 1.17.0 issues in this step. * *DECISION*: Upgrade the ooo-wiki MediaWiki(MV) + all extensions to MW=20 v1.17.0. * *DECISION*: I have agreed with infrastructure that we will keep 1 core = on "standby" so we can up the VM to a 2-core VM if we are seeing=20 unacceptable performance problems with one-core. * *DECISION*: We will cut over the wiki and the forums with the content=20 as-is and implement branding and access control changes within the a.o=20 infrastructure when volunteers come on-stream to resource this. This is = the standard "transfer then clean-up" approach adopted when a migration=20 is time critical. *CUT-OVER* There are two facets to cut-over: content move and DNS-based IP=20 reassignment. Clearly we need to freeze update access to the services=20 prior to start of content move and continue update-freeze on the legacy=20 service. Bringing the content across involves a backup, copy restore=20 which can be rehearsed and scripted, but in the case of the wiki, this=20 will be a few hours even if fully automated. * There are many way "to skin the cat" of the migration process. All = will involve some service loss, but the complexity of the rehearsal and=20 planning come explode as we reduce this outage to a zero. Complex plans = can also go wrong so my instinct is to keep it simple: halt the service=20 at a pre-notified time, transfer and start new service at a=20 pre-notified time. * *DECISION*: Halt the wiki service for a notified (24hr) window during=20 cutover. The migration uses fixed IPs, so DNP IP reassignment is=20 co-incident with service stop. * *GOAL*: Cut over [wiki] within 14 days from today. Date TBD by PM. I = can do the content move. * We have some further caching tweaks on the interaction of the=20 MediaWiki [application] with the ATS HTTP reverse proxy cache, but these = are probably nice-to-have than essential. More to the point these need=20 to be done on a system will production load patterns. * *DECISION*. We will defer such tuning until post go-live. *** END OF THE WIKI NOTES ***