SharePoint Document Library Mirroring

UPDATE: This project has now (finally) been ported to SharePoint 2010 (the source can still be found at http://datagilitydocmirror.codeplex.com/). Functionality is largely as described below, with differences including:
 
Mirroring is now enabled on a per-library basis with a Ribbon command and dialog to allow the user to toggle the setting.
 
 
 
 
Site-collection settings now take better advantage of SharePoint 2010 UI
 
 
(note: I really like the way that it’s easy to leverage pre-built functionality such as templated sections like the OK/Cancel buttons)
 
Other changes:
  • All settings are now stored in a SharePoint list rather than as a PersistedObject, mainly to support a lightweight/stealth implementation that could still be driven without the UI elements.
  • Microsoft’s Enterprise Library has been removed and logging is now to the ULS.
  • A Site Data Query now gets the Guids for the parent List and Web for each audited item, meaning we’re not relying on trying to parse urls (or similar) to get this information.
  •  Obviously, everything is now wrapped up in a nice VS2010 solution 🙂 which no longer includes the console app as it never really got used.

OVERVIEW
 
SharePoint Document Library Mirroring is an implementation of something akin to SourceSafe’s Shadow Folders (i.e it ensures that the most recent state of a site collection’s document libraries is replicated at a designated location on the file system).
The source can be found at http://datagilitydocmirror.codeplex.com/ and this post gives an overview of the solution. It should be noted that Document Library Mirroring (DocMirror) is somewhat odd in that it has been built as a Visual Studio / WSS 3.0 solution with the aim being to immediately migrate it to SharePoint 2010 (which will be the subject of another post in the near future). Consequently no further enhancements will be made on this branch of the code – these developments will occur in the SharePoint 2010 project.
So exactly what does DocMirror do? It allows an administrator to set a root folder for document mirroring and then replicates changes to documents within the site collection to folders at the same path relative to this root. E.g. If (as shown below), the shadow root is set to C:\Code\Datagility.Shpt.DocMirror\ShadowFolder with mirroring enabled if a document is added to the site collection’s ‘Shared Documents Folder’…
 

A Document in a Document Library

… then this document will be written to the folder  C:\Code\Datagility.Shpt.DocMirror\ShadowFolder\Shared Documents.
 

The Document Mirrored to the File System (Advanced Stuff!)

 If this document is subsequently updated (either the document itself or its properties – well, its name anyway) the document on the file system is replaced by the newer version and if the document is moved or deleted from this location it is deleted from the file system. Sub sites under a site collection and sub folders in a document library simply become folders under the shadow root E.g. C:\Code\Datagility.Shpt.DocMirror\ShadowFolder\subsite1\Documents\folder1
 
Why would you want to do this?
 
Well, the initial requirement was for a client new to SharePoint, but more importantly also new to SQL Server. The conversation went along the lines of “You want us to put all our documents where? What are we going to do when it stops working?”. So the shadow folder became their safety net in case they ever had to wait to restore a broken content database. However, I’ve also started to use it in conjunction with Live Mesh (which is called something else by now): Changes to SharePoint documents get played out to the shadow folder which also happens to be a ‘Mesh-ed’ folder and so these changes are further synchronised to my Live Desktop and every device that I’ve added to my Mesh. I’m sure I’ll think of other uses as well.
 
How does it work?
 
DocMirror works by reading the WSS Audit Log and replaying any captured changes to documents (it’s a log miner). This means that auditing must be enabled from the Central Admin site and the correct actions must be being captured before DocMirror has any useful work to do. See here http://msdn.microsoft.com/en-us/library/bb397403(office.12).aspx for more details on enabling auditing. With the correct events being written to the log, DocMirror uses a custom SharePoint timer job to periodically query the log and process any changes.
 

The Custom Document Mirroring Timer Job

 For now, the available configuration settings are pretty limited. An administrator can enable or disable mirroring and they can set the root folder. A planned enhancement is to allow individual libraries to be selected or deselected for mirroring. However, these settings are accessed from an admin page accessible from the Site Settings page.

The Admin Option in Site Settings

 The admin options are saved to SPPersistedObject objects, ensuring that they’re available to all WFE servers in a farm, but it’s worth noting that the timer job is scoped such that it only runs on one server (I wanted to avoid any potential conflicts arising from more than one instance of the job processing the log and accessing the file system at the same time). See here http://blogs.pointbridge.com/Blogs/morse_matt/Pages/Post.aspx?_ID=55 for a discussion on how this is achieved (and how the MSDN documentation seems to be incorrect in this case).

Document Mirroring Options

 As a slight aside, the solution also contains a console app from which you can access the same functionality as the timer job. It’s been useful in testing and development and I can imagine using it to carry out a ‘controlled’ one-off processing of a very large log as mirroring is established.

THe Mirroring Console App

I’m not going to go into too much detail as to exactly how the solution is built here as the code is freely available for download (see above) and I think it’s pretty easy to follow, but I will highlight a few key points. The solution is a Visual Studio 2008 solution that uses VSeWSS 1.3 to build the deployable WSP. All elements (the timer job, the admin page etc…) are deployed as features by this solution although I did run into issues with deploying from Visual Studio. The process became to package within VS and then run the generated Setup.bat from a separate console process.

The Mirroring Visual Studio Solution

 Unit tests are included as part of the solution and I must confess they are perhaps not quite as comprehensive as they could be (I haven’t done the code coverage analysis), but they are still pretty thorough and do allow each of the different elements of the logic to be tested independently. One aim I originally had when starting this project was for deployment to be as simple as possible, even in a multiple WFE farm. All was going well until I looked at how DocMirror would log its activity. I considered planning to log to SharePoint’s internal log, but couldn’t get past the fact that the documentation tells you that you’re not allowed to do that, so I fell back on the Enterprise Library. DocMirror uses Enterprise Library 4.1 Logging which makes it very simple to have processing activity logged as below…

A Log File!

… but it does mean:

  1. The a lot of information now needs to be written to the config file (on each WFE server) and I didn’t fancy trying to build it up using SPWebConfigModification objects because the limitations of this are well documented and…
  2. The config needs to be accessible to the custom timer job which runs under the Windows SharePoint Service Timer Server (OWSTIMER.EXE) so putting it in Web.config wouldn’t help anyway. It either needs to go in OWSTIMER.exe.config or Machine.config on whichever WFE has been designated to run the timer job and I haven’t yet figured out a way of neatly automating this deployment.
  3. We now have dependencies on the Enterprise Library binaries (3 of them, anyway) which need to be GAC’ed. What if another solution has already deployed another set of these binaries (signed with a different key)? Do we want to just keep adding multiple side-by-side assemblies?

I haven’t solved these problems in this version so some post-installation modification of config files is currently necessary.

All of which means that the planned enhancements for the next version are:

  • Look again at the admin page (it currently inherits from WebPartPage which means that it looks like a content page and not a settings page).
  • Create a robust deployment package.
  • Allow greater flexibility with regard to what gets mirrored.

[UPDATE: Planned enhancements deployed with 2010 version!]

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: