Tuesday 28 August 2012

Secureremote to change one line. No, go wild with regex and change huge chunks!

I had a problem recently where I had to publish a web site where the "branding" was non existent, and looked horrific.  I wanted to bring it in line, and wanted to do it by replacing a chuck of HTML (the login form) with my own.

I did this using the WhlSecureRemote.xml file but came across a problem that the text was not actually consistent and was liable to change, and more annoyingly was over multiple lines.  The search tags in SecureRemote by default only check a single line as you cannot include linebreaks and whitespace in them.

The answer was to change my search tag to use regex: usually you start a search with <SEARCH encoding=""> (or encoding="base64" if you so desire) but there is another option in search for mode, so you can use <search encoding="" mode="regex">

Now you can match as much text as you want.  So I decided where I wanted to start the replace from (I used a parameter in the FORM tag in the HTML as a consistent place to anchor the start) and the end point (the end of the text in the form usefully always was "password.")

This looked something like:

<SEARCH encoding="" mode="regex">NAME="FormAuth">(\n|.)*password\.</SEARCH>

The key part of this is the (\n|.)* which says any number of characters and line breaks - .* only is any number of characters and does not include line breaks

The replace also started NAME="FormAuth"> but then had the html for our custom design for our login username and password, with the nice submit button and the CSS styles we use. The replace does not support regex (think about it, it can't predict what to write!) and I prefer to use base64 for the replace as you can include line breaks in base64, which makes your resulting HTML much tidier.

Job done - 10 days dev and testing replaced with 10 minutes of UAG coding.

Regex in UAG to acheive 'not' is possible

I have been stuck with a few problems recently in UAG where I have wanted to use regex but in a negative form, eg 'anything except PDF and DOC'.

You have to do what is called a negative lookahead regex, which looks something like this:

 ^(?!.*(PDF|DOC).*).*

I am not going to try to explain this except to say that it allows any text that does not include PDF or DOC anywhere in the text - I use anywhere intentionally as I do not trust the URL not to include parameters and # bookmarks.  As I am no regex guru and there are plenty of sites better than this that can step you through how negative lookahead works I won't go character by character, but you could improve this by being more specific (at a minimum maybe \.PDF to force .PDF)

You can be more imaginative and use multiple elements and negatives - use a regex tester like regexpal online to verify what you are writing as it is much quicker than testing in UAG!

Where do you need regex? The places I most use it is in Appwrap/SecureRemote
 - in the search tags (add mode="regex" to the search tag)
and more importantly
 - in the pages to parse. 

In another post I talked about the 10Mb limit in Appwrap and I needed this to exclude binary objects.

Binaries/Pages over 10Mb fail when using Appwrap

I have found that a number of sites I have fail when the page returned is over 10Mb.  I have had this when data objects in the page are large (eg an embedded table linked to a query that shows way too much in my opinion), reports loaded in a java applet or a PDF - it does not seem to matter.  What I have found is that when you use Appwrap and search/replace objects in all pages (your regex for pages is .*) the initiation of the appwrap causes an error to be returned to users in place of the large data object.

The reason for the 10Mb is the max page size defined in UAG - this limits the size of a single html page to 10Mb (which is huge generally) and this can be changed - but changing it has significant implications for CPU and memory usage as every page (as far as I understand) claims 10Mb of memory in UAG when parsed.

However, UAG for some reason inspects even binary data, and that means for every object the 10Mb limit is imposed.

The answer is to either be more specific and state only the pages you want to parse (eg .*\.jsp.*) or you can do 'not' regexes to say 'not PDF' or the like.  I am doing a seperate page for this, look at my next post, as it is complicated and is useful I think in it's own right.

Wednesday 22 June 2011

Slow loading off applications after activation

I have just got an interesting answer from MS about an issue I have: the first time I access a trunk , and it happens for each application on the trunk not just once, I get a significant delay. After the first load it works flawlessly until the config is activated again

The answer is with DNS - UAG analyses the site (actually it seems to analyse each page it sees for the first time) looking for URLs. It does a forward lookup for each URL it finds to know it's IP (in case HAT needs to rewrite the IP not the name) and a reverse lookup for each IP to ensure the name is correct.

So for every application, and every link in every application (eg the IIS demo page has a link to go.microsoft.com) the UAG box must be able to resolve the URL.

You can do this in your DNS or in the host file.

Once it does this lookup it stores the result and does not do it again.

Thursday 3 February 2011

Appwrap, HAT and Appwrap

aka how UAG does inline replacement of links on pages and how you can use this to rewrite pages...
 

HAT
When you publish a site, you see that links on that site get rewritten so that they pass through UAG (so that the client browser is provided a link that works externally instead of a link that would only work internally.  This is what Microsoft call Host Address Translation or HAT, and is integral to UAG as a product.


It works quite simply on the face of it; if it sees a link that matches an application (NB if you use non standard ports for your sites, see the post 'Add the default port to the host') then it rewrites it so it works for external clients.


Appwrap
I have posted before about Appwrap and its ability to rewrite pages that you display – there is also quite a lot on google about it.  In a nutshell you need to go to your UAG directory \von\conf\WebSites\[name of your trunk]\conf.  Create a folder called CustomUpdate and copy WhlFiltAppWrap_HTTP.xml (or HTTPS if your site is HTTPS) to this directory.  Edit this file, deleting the contents between the second tag <MANIPULATION> and the second to last tag </MANIPULATION>


You can then write two types of replacement text.  First is manipulation per application


<MANIPULATION_PER_APPLICATION>
      <APPLICATION_TYPE></APPLICATION_TYPE>
      <DATA_CHANGE>
      <URL case_sensitive="false">.*</URL>
            <SAR>
                  <SEARCH>text I want to find</SEARCH>
                  <REPLACE>what I want to replace it with</REPLACE>
            </SAR>           
      </DATA_CHANGE>
</MANIPULATION_PER_APPLICATION>


where


<APPLICATION_TYPE></APPLICATION_TYPE> surrounds the type of application – this is primarily used by Microsoft in their files to identify Sharepoint, OWA etc.  If you leave it with nothing between the tags it works on all sites


<URL case_sensitive="false">.*</URL> says which pages to inspect (as a regular expression) – this checks all pages, which will work but will hit the CPU of the box.  Once it works try to tie down the expression eg *./mysite/somepages/*.\.asp – otherwise it does a regex across every page that is loaded


The search and replace is as shown.  If you want to do multiple replacements, have multiple <SAR></SAR> sections


However, this Appwrap works after the HAT occurs.  SO you could use this to rewrite a bit of text, or a link to an internal site so that external users go to the external URL of the site.


But what happens if you have a link internally that you want to rewrite to an internal URL, with the hope that HAT will then see that URL and rewrite it again to the external URL (while you could probably predict the external URL UAG will generate, its open to a lot of problems), or if you want to catch an internal URL that HAT would otherwise rewrite and change it before HAT touches it?


Appwrap (before HAT)
There is a second file called WhlFiltSecureRemote_HTTP.xml (or HTTPS if your site is HTTPS) that is rarely documented by Microsoft – there are a handful of results for this file from Google right now – one of them is actually the Technet forum article I used to work this out.


Here you can do many different types of replacement and this is a really core part of how UAG works.  You can insert your own scripts into this by adding a WhlFiltSecureRemote_HTTP.xml file into CustomUpdate in the same path as above, with the following contents


<WHLFILTSECUREREMOTE ver="2.2">
      <DATA_CHANGE>
            <SERVER>
                  <SERVER_NAME mask="">.*</SERVER_NAME>
                  <PORT>8080</PORT>
                  <URL>
                        <NAME></NAME>
                        <SEARCH encoding="">text I want to find</SEARCH>
                        <REPLACE encoding="">text I want to replace it with</REPLACE>
                  </URL>                 
            </SERVER>
      </DATA_CHANGE>
</WHLFILTSECUREREMOTE>


<SERVER_NAME mask="">.*</SERVER_NAME> allows you to specify a single server or a range of servers via regular expression (this says ‘all servers’). This refers to the internal real server names that UAG is talking to, ie the servers stated in the applications list
<PORT>8080</PORT> is the port the server is listening on
<NAME></NAME> allows you to tie down which URLs to run this replacement on (I left it as blank = any)
The search and replace tags set what to find and replace it with


Now you can do things before HAT gets its hands on your code – I use this to catch a URL internally that is written wrongly – I use the full myserver.contoso.com server name but some of the site links just say myserver/something.  I catch these ‘myserver/’ strings using this script and rewrite them to the full myserver.contoso.com/something (by looking for myserver/ I don’t end up rewriting any links that are correct as it will not match myserver.contoso.com).  I also use this to catch internal links that are https when I want to access them via http (the reason for which is another story altogether...)

Tuesday 14 December 2010

Beware: Using UAG with an SSL decryptor in front

A quick beware: In UAG you set up your trunks as http or https - which is fine unless you have an SSL decryptor in front of the UAG server. Using this, you access the site using https, but the traffic is decrypted and reaches the server as http.
The problem here is that UAG uses HTTP 302 redirects extensively. However, the content of these are defined by whether the trunk is http or https and as such in the scenario above returns redirects with http links.
This means that (assuming your SSL decryptor cannot/does not rewrite those links for you), users directed to the http version of the site not the https, which completely breaks the site for the user.
The easy fix is to set up something that accepts those http requests and redirects users to the https site (I use my load balancer)
More complicated is to use AppWrap to rewrite the 302s - I will update when I test this
--Chris

(update below following MS answer:)

* Create the file c:\program files\Microsoft Forefront Unified Access Gateway\von\conf\websites\\conf\CustomUpdate\WhlFiltAppWrap_HTTP.xml (you may have to create the CustomUpdate folder)


* Paste the XML snippet below into it



==============START=================
<APP_WRAP ver="1.0" id="RemoteAccess_HTTPS.xml">
<MANIPULATION>
<HEADER_CHANGE>
<RESPONSE>
<APPLICATION>
<SERVER_NAME mask="">localhost</SERVER_NAME>
<PORT>6001</PORT>
<URL>
<URL_NAME>.*</URL_NAME>
<HEADER>
<NAME>Location</NAME>
<SAR>
<SEARCH encoding="">http://www.example.com</SEARCH>
<REPLACE encoding="">https://www.example.com</REPLACE>
</SAR>
</HEADER>
</EDIT>
</URL>
</APPLICATION>
</RESPONSE>
</HEADER_CHANGE>
</MANIPULATION>
</APP_WRAP>

===============END==================


* Activate UAG.

* You can still use appliciation manipulation tags in here if you have other AppWrap

Tuesday 26 October 2010

Customising the portal is complicated

For info, a summary of my thread on the UAG forums re customising the portal: (http://bit.ly/cXr8kf)

Question: I am trying to customise the layout of the portal pages - I know of the changes in Technet etc to hide the left bar, top har, toolbar etc, but I want to rearrange the look of the whole page. However, I am having an almighty battle with changing the DIVs' positioning in CSS (no help from IE & Firefox layout differences!).

I have spent a couple of hours on this, playing with DIV positioning etc, but not really in huge depth as every change I made caused odd effects. DIVs within tables within DIVs etc...I spent about an hour trying to stop the main content window overflowing off the bottom of the page acheiving only some loss of sanity

Has anyone made wholesale changes to the layout of the Portal? Does it work well or are there gotchas?

And is this beyond the support boundary for UAG (the technet article is unclear on how far you can go)

Ben Ari MSFT: As for supportability, customizations beyond what the documentation instructs are not supported.

Ran MSFT: The boundaries of supported web-content customizations in UAG are actually simple to define: if you can do it in a CustomUpdate customization, then it's supported. Another supported customization is for some very specific resources where the UAG management console allows you to specify a new file, instead of the default one (i.e. the Login and the Error page). Other than these, other changes are not supported. True, there is nothing stopping you from modifying one of the default files that UAG comes with, but besides those changes not being backed-up, and not being replicated across UAG array members, they are also not supported.

Ran MSFT: ... the CSS is customizable via a CustomUpdate folder, and so is the .master file. You can find references to these under the http://technet.microsoft.com/en-us/library/ff607389.aspx section, and to be more specific, customizing the Standard.master file is mentioned in these articles: http://technet.microsoft.com/en-us/library/ee861154.aspx and http://technet.microsoft.com/en-us/library/ee861172.aspx

SuperNaraen: We've had to revamp the layout. Here is our experience.

At first we tried to change the css. It turned out to be very involved. So, we dropped that approach.

Then we tried the other extreme, to completely create our own portal web application that would paint the UAG links (Can't locate the article (by Tom Shinder?) right now, which described how to do it for IAG. We found that article very helpful). We found that we were investing a lot of time getting functionality like logoff, timeout behavior etc that comes bundled with the portal.

We have now settled on a hybrid approach. We published our custom portal web application within the portal frame. We then hid the headers and the navbar from the portal frame completely. This is giving us the best of both worlds.