Thursday 3 February 2011

Appwrap, HAT and Appwrap

aka how UAG does inline replacement of links on pages and how you can use this to rewrite pages...
 

HAT
When you publish a site, you see that links on that site get rewritten so that they pass through UAG (so that the client browser is provided a link that works externally instead of a link that would only work internally.  This is what Microsoft call Host Address Translation or HAT, and is integral to UAG as a product.


It works quite simply on the face of it; if it sees a link that matches an application (NB if you use non standard ports for your sites, see the post 'Add the default port to the host') then it rewrites it so it works for external clients.


Appwrap
I have posted before about Appwrap and its ability to rewrite pages that you display – there is also quite a lot on google about it.  In a nutshell you need to go to your UAG directory \von\conf\WebSites\[name of your trunk]\conf.  Create a folder called CustomUpdate and copy WhlFiltAppWrap_HTTP.xml (or HTTPS if your site is HTTPS) to this directory.  Edit this file, deleting the contents between the second tag <MANIPULATION> and the second to last tag </MANIPULATION>


You can then write two types of replacement text.  First is manipulation per application


<MANIPULATION_PER_APPLICATION>
      <APPLICATION_TYPE></APPLICATION_TYPE>
      <DATA_CHANGE>
      <URL case_sensitive="false">.*</URL>
            <SAR>
                  <SEARCH>text I want to find</SEARCH>
                  <REPLACE>what I want to replace it with</REPLACE>
            </SAR>           
      </DATA_CHANGE>
</MANIPULATION_PER_APPLICATION>


where


<APPLICATION_TYPE></APPLICATION_TYPE> surrounds the type of application – this is primarily used by Microsoft in their files to identify Sharepoint, OWA etc.  If you leave it with nothing between the tags it works on all sites


<URL case_sensitive="false">.*</URL> says which pages to inspect (as a regular expression) – this checks all pages, which will work but will hit the CPU of the box.  Once it works try to tie down the expression eg *./mysite/somepages/*.\.asp – otherwise it does a regex across every page that is loaded


The search and replace is as shown.  If you want to do multiple replacements, have multiple <SAR></SAR> sections


However, this Appwrap works after the HAT occurs.  SO you could use this to rewrite a bit of text, or a link to an internal site so that external users go to the external URL of the site.


But what happens if you have a link internally that you want to rewrite to an internal URL, with the hope that HAT will then see that URL and rewrite it again to the external URL (while you could probably predict the external URL UAG will generate, its open to a lot of problems), or if you want to catch an internal URL that HAT would otherwise rewrite and change it before HAT touches it?


Appwrap (before HAT)
There is a second file called WhlFiltSecureRemote_HTTP.xml (or HTTPS if your site is HTTPS) that is rarely documented by Microsoft – there are a handful of results for this file from Google right now – one of them is actually the Technet forum article I used to work this out.


Here you can do many different types of replacement and this is a really core part of how UAG works.  You can insert your own scripts into this by adding a WhlFiltSecureRemote_HTTP.xml file into CustomUpdate in the same path as above, with the following contents


<WHLFILTSECUREREMOTE ver="2.2">
      <DATA_CHANGE>
            <SERVER>
                  <SERVER_NAME mask="">.*</SERVER_NAME>
                  <PORT>8080</PORT>
                  <URL>
                        <NAME></NAME>
                        <SEARCH encoding="">text I want to find</SEARCH>
                        <REPLACE encoding="">text I want to replace it with</REPLACE>
                  </URL>                 
            </SERVER>
      </DATA_CHANGE>
</WHLFILTSECUREREMOTE>


<SERVER_NAME mask="">.*</SERVER_NAME> allows you to specify a single server or a range of servers via regular expression (this says ‘all servers’). This refers to the internal real server names that UAG is talking to, ie the servers stated in the applications list
<PORT>8080</PORT> is the port the server is listening on
<NAME></NAME> allows you to tie down which URLs to run this replacement on (I left it as blank = any)
The search and replace tags set what to find and replace it with


Now you can do things before HAT gets its hands on your code – I use this to catch a URL internally that is written wrongly – I use the full myserver.contoso.com server name but some of the site links just say myserver/something.  I catch these ‘myserver/’ strings using this script and rewrite them to the full myserver.contoso.com/something (by looking for myserver/ I don’t end up rewriting any links that are correct as it will not match myserver.contoso.com).  I also use this to catch internal links that are https when I want to access them via http (the reason for which is another story altogether...)