Sunday, June 26, 2011

Lessons Learnt From Implementing SharePoint Word Automation Services

While implementing Word To PDF conversion in SharePoint 2010, I came across a whole lot of hurdles before getting the solution on track. I started off with an Event Receiver in VS.NET 2010, which hooks the "Item Being Added" event against the "Document Library". But once the solution was deployed nothing seems to be working as programmed. The document which was uploaded wasn't getting converted to PDF and the toughest part was that I could find a way to debug this solution. 

If you are new to "SharePoint Word Automation Services", let me give you a sneak preview of what it is all about. If you have worked with VBA, you might be aware of the fact that, implementing "Office Automation" requires Microsoft Office to be installed on the machine where the application is going to be deployed, without which the application is going to fail. In most cases no administrator will allow you to install Microsoft Office on a server and even if you some how get it installed, the server is going to doom because of the number of office application instances its going to create by the VBA application. Here's an article from Microsoft on this. So the crux with Microsoft Office is that, its only targeted for workstation\desktop segment and not for the server segment. Now the million dollar question is, how can a product like SharePoint survive without Microsoft office functionality at the server side. The answer to that is, Microsoft released a solution for this very problem under the name "Application Services" in SharePoint. Literally this means Microsoft exposed functionalities of all major Microsoft Office Applications like Word, Excel, Access, Visio, PowerPoint through libraries aka Office Server Object Model that could be referred by an application to create and manipulate the aforesaid office documents. The down side with "Application Services" is that, they aren't available as independent product, instead its packaged only with SharePoint Server versions. This doesn't mean that you need to purchase the expensive SharePoint Server for your application to generate Word or Excel documents at the server end, instead you could opt for commercial solutions like ASPOSE or SyncFusion which does the job of "Office Automation" pretty well.

That was a pretty big preview on what "Word Automation Services" were all about. Coming back to the point on how I came to learn the hard way, while trying to get the Word to PDF functionality on track. So here are the points.
  1. Make sure the "Microsoft Word Automation Services" are running by going to Central Admin -> Manage Services in this farm

  2. You need to refer the assembly "Microsoft.Office.Word.Server.dll" which can be found in "\ISAPI\Microsoft.Office.Word.Server.dll"

  3. Don't forget to set you application to 64-bit in project properties and the target framework as .NET 3.5

  4. While passing the path of the Word Document uploaded to the list
     
    objJob.AddFile(FileToConvert, PDFFilePath);you need to pass the absolute URL. I just passed the properties.AfterUrl  which did only contain the "/". Instead you had to build the absolute URL like this SPItemEventProperties.WebUrl (i.e. properties.WebUrl)

  5. When creating an instance of the ConverionJob class make sure you pass the "WordServiceApplicationProxy" text as seen in the Central Admin -> "Manage Services in the farm" page. Passing wrong name as the first parameter will cause to raise an exception "No Word Automation Service found".
    ConversionJob objJob = new ConversionJob("Word Automation Services", objJobSettings);

  6. Don't forget to assign the user token. The assigned user token should have access to the documents in the list.
    objJob.UserToken = objWeb.CurrentUser.UserToken;

  7. Lastly I was expecting the PDF document to show up in the document library immediately after adding\uploading the word document to the document library, but the PDF didn't show up even after waiting for a few minutes. The reason being, the code  objJob.Start(); doesn't mean that the document will get converted immediately after executing the code, instead the job of DOC to PDF will be added to queue which will be eventually processed. The Word Automation Services will be called every 15 minutes by the scheduler, so by that time only you will see the PDF. If you want to get the output immediatelythen you could go to the central admin, select "Job Status" find the text "Word Automation Services", click open the job and click the button "Run Now". Now you should be able to see the output within a few seconds say 10 or 20.

  8. Finally the most embarrassing situation I faced, after deploying the EventReceiver solution was on, how to debug the code. After putting my thoughts on how I solved similar situations in past, the solution that crossed my mind was the following code snippet. I immediately updated the code with the following line as the first statement in the event, after which I build and deployed the code.  
    Debugger.Break();

    Now after uploading the word document, a message box showed up with the text "Unhandled exception in w3wp.exe do you want to debug?" I clicked yes and now the debugger kicked in and selected the name of the already open EventReceiver project and finally I am inside VS.NET with control(yellow highlighting) at the Debugger.Break() statement. Yes the old trick did work. Previously I had used this statement in some of my console applications to get the debugger attached at run-time. So thankfully it helped me to debug my EventReceiver project.

1 comment:

Anonymous said...

When you converted the word doc to pdf, did it carry over all the metadata with it? If not, how could we make that happen?