incubator-allura-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Brondsema <>
Subject Re: developing a bulk export / backup feature
Date Wed, 26 Jun 2013 13:53:35 GMT
On 6/21/13 11:27 AM, Dave Brondsema wrote:
> I found an nginx module for custom authentication that we could play around with
> and see if it works:
>  I'm
> sure Apache has similar modules too.

Our ops team wasn't keen on recompiling nginx to be able to add a 3rd-party
module.  They suggested using non-http methods to provide the file.  At
SourceForge, we have ssh/scp/sftp access for projects, so that would be a good
delivery mechanism for the backup zip.  Other allura instances that use this
might have to figure out what works well for them, but we could make it flexible
& configurable: the zip file could get created in any directory (specified by a
path pattern in the .ini file) and email notification upon completion could
provide access instructions (text configurable via the .ini file)

> If all we do is send an email, and don't show the status on the admin page
> anywhere, a very long running backup could cause the admin to think it got stuck
> or died, and thus request another backup.  I suppose we should have
> anti-dogpiling logic to avoid that.
> On 6/21/13 10:43 AM, Cory Johns wrote:
>> Have we even tested serving large files through the app stack?  I strongly
>> suspect they'd hit the long-request timeout.  I know I've hit it before
>> when testing uploading large-ish attachments.
>> And on the subject of attachments, the API end-points already (or will with
>> the next push) include attachment metadata, including the URL to download
>> them from.  I definitely think that's good enough for now, as the admin can
>> parse the URLs out and download them, if needed.  If that proves to be too
>> onerous for doing project exports, then we can address it at that time.
>> Going back to serving up the exports, is there any way we could serve them
>> outside of the app stack but still with authentication?  Such as a
>> standalone, light-weight service that just serves files with authentication
>> (could be useful for the screenshots and icons for private projects), or
>> via authenticated SFTP?  This is verging on an infrastructure question at
>> this point, but I definitely agree that we should have some auth in front
>> of it but it's not going to be easy.
>> On Tue, Jun 18, 2013 at 10:26 AM, Dave Brondsema <> wrote:
>>> For us at SourceForge, we have a need to build a feature that lets project
>>> admins download a backup/export of all their project data.  Since this is a
>>> pretty big feature, I wanted to propose here how we might do it and get
>>> feedback
>>> & ideas before we proceed.
>>> Add a bulk_export() method to Application which would be responsible for
>>> generating json for all the artifacts in the tool.  The format should
>>> match the
>>> API format for artifacts so that we're consistent.  Thus any tool that
>>> implements bulk_export() would typically loop through all the artifacts
>>> for this
>>> instance (matching app_config_id) and convert to json the same way the API
>>> json
>>> is generated (e.g. call the __json__ method or RestController method; some
>>> refactoring might be needed).  Multiple types of artifacts/objects could be
>>> listed out in groups, e.g. Tracker app could have a list of tickets, list
>>> of
>>> saved search bins, list of milestones, and the tracker config data.
>>>  Discussion
>>> threads would need to be included too, ideally inline with the artifact
>>> they go
>>> with.  No permission checks would be done since this export would only be
>>> available to admins (makes it faster & simpler).
>>> Provide a page on the Admin sidebar to generate a bulk export.  Project
>>> admins
>>> could choose individual tool instances, or all tools in the project (that
>>> support it).  That form would kick off a background task which goes
>>> through the
>>> selected tools and runs their bulk_export() methods.  Save each tool's
>>> data as
>>> mount_point.json and zip them all together.
>>> It'd be easiest to store & deliver the zip files similarly to the code
>>> snapshots
>>> (static files not served through allura), but that won't be secure.  We'll
>>> need
>>> to either serve it through allura with authentication, or maybe name the
>>> zip
>>> file with a random name that can't be guessed (and then serve it directly
>>> through apache or nginx).  Other ideas?
>>> When the task is complete, notify the user.  What way is best?  Send an
>>> email?
>>> Probably would be good to show a listing of available completed extracts
>>> on the
>>> extract page, so if any older ones are still sitting around they can be
>>> retrieved (would be up to server admins to have a cron to delete old files)
>>> We could make this something that can be triggered automatically via the
>>> API and
>>> check status through the API, but that seems like a good thing to add on
>>> later.
>>> Should we include attachments?  These would be important in some cases but
>>> not
>>> in others.  It could also increase the export size immensely in some cases.
>>> Maybe leave out for now, and add in later when needed, possibly as an
>>> option.
>>> Further thoughts on implementation details:
>>> So that a giant json string doesn't have to be held in memory for each
>>> tool, the
>>> export task should open a file handle for mount_point.json and send call
>>> bulk_export() with that open file handle and each App can append to their
>>> file
>>> incrementally.
>>> If mongo performance is slow, some refactoring may be needed to avoid lots
>>> of
>>> individual mongo calls and be more batch oriented.  We can see how it goes.
>>> Could parallelize bulk_export() later, to do multiple tools at once.
>>> Sound reasonable?  Any suggestions or other ideas?
>>> --
>>> Dave Brondsema :
>>> : personal
>>> : programming
>>>               <><

Dave Brondsema : : personal : programming

View raw message