incubator-allura-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Brondsema <d...@brondsema.net>
Subject developing a bulk export / backup feature
Date Tue, 18 Jun 2013 14:26:48 GMT
For us at SourceForge, we have a need to build a feature that lets project
admins download a backup/export of all their project data.  Since this is a
pretty big feature, I wanted to propose here how we might do it and get feedback
& ideas before we proceed.

Add a bulk_export() method to Application which would be responsible for
generating json for all the artifacts in the tool.  The format should match the
API format for artifacts so that we're consistent.  Thus any tool that
implements bulk_export() would typically loop through all the artifacts for this
instance (matching app_config_id) and convert to json the same way the API json
is generated (e.g. call the __json__ method or RestController method; some
refactoring might be needed).  Multiple types of artifacts/objects could be
listed out in groups, e.g. Tracker app could have a list of tickets, list of
saved search bins, list of milestones, and the tracker config data.  Discussion
threads would need to be included too, ideally inline with the artifact they go
with.  No permission checks would be done since this export would only be
available to admins (makes it faster & simpler).

Provide a page on the Admin sidebar to generate a bulk export.  Project admins
could choose individual tool instances, or all tools in the project (that
support it).  That form would kick off a background task which goes through the
selected tools and runs their bulk_export() methods.  Save each tool's data as
mount_point.json and zip them all together.

It'd be easiest to store & deliver the zip files similarly to the code snapshots
(static files not served through allura), but that won't be secure.  We'll need
to either serve it through allura with authentication, or maybe name the zip
file with a random name that can't be guessed (and then serve it directly
through apache or nginx).  Other ideas?

When the task is complete, notify the user.  What way is best?  Send an email?
Probably would be good to show a listing of available completed extracts on the
extract page, so if any older ones are still sitting around they can be
retrieved (would be up to server admins to have a cron to delete old files)

We could make this something that can be triggered automatically via the API and
check status through the API, but that seems like a good thing to add on later.

Should we include attachments?  These would be important in some cases but not
in others.  It could also increase the export size immensely in some cases.
Maybe leave out for now, and add in later when needed, possibly as an option.

Further thoughts on implementation details:

So that a giant json string doesn't have to be held in memory for each tool, the
export task should open a file handle for mount_point.json and send call
bulk_export() with that open file handle and each App can append to their file
incrementally.

If mongo performance is slow, some refactoring may be needed to avoid lots of
individual mongo calls and be more batch oriented.  We can see how it goes.

Could parallelize bulk_export() later, to do multiple tools at once.


Sound reasonable?  Any suggestions or other ideas?


-- 
Dave Brondsema : dave@brondsema.net
http://www.brondsema.net : personal
http://www.splike.com : programming
              <><

Mime
View raw message