Tuesday, 28 August 2012

Secureremote to change one line. No, go wild with regex and change huge chunks!

I had a problem recently where I had to publish a web site where the "branding" was non existent, and looked horrific.  I wanted to bring it in line, and wanted to do it by replacing a chuck of HTML (the login form) with my own.

I did this using the WhlSecureRemote.xml file but came across a problem that the text was not actually consistent and was liable to change, and more annoyingly was over multiple lines.  The search tags in SecureRemote by default only check a single line as you cannot include linebreaks and whitespace in them.

The answer was to change my search tag to use regex: usually you start a search with <SEARCH encoding=""> (or encoding="base64" if you so desire) but there is another option in search for mode, so you can use <search encoding="" mode="regex">

Now you can match as much text as you want.  So I decided where I wanted to start the replace from (I used a parameter in the FORM tag in the HTML as a consistent place to anchor the start) and the end point (the end of the text in the form usefully always was "password.")

This looked something like:

<SEARCH encoding="" mode="regex">NAME="FormAuth">(\n|.)*password\.</SEARCH>

The key part of this is the (\n|.)* which says any number of characters and line breaks - .* only is any number of characters and does not include line breaks

The replace also started NAME="FormAuth"> but then had the html for our custom design for our login username and password, with the nice submit button and the CSS styles we use. The replace does not support regex (think about it, it can't predict what to write!) and I prefer to use base64 for the replace as you can include line breaks in base64, which makes your resulting HTML much tidier.

Job done - 10 days dev and testing replaced with 10 minutes of UAG coding.

Regex in UAG to acheive 'not' is possible

I have been stuck with a few problems recently in UAG where I have wanted to use regex but in a negative form, eg 'anything except PDF and DOC'.

You have to do what is called a negative lookahead regex, which looks something like this:

 ^(?!.*(PDF|DOC).*).*

I am not going to try to explain this except to say that it allows any text that does not include PDF or DOC anywhere in the text - I use anywhere intentionally as I do not trust the URL not to include parameters and # bookmarks.  As I am no regex guru and there are plenty of sites better than this that can step you through how negative lookahead works I won't go character by character, but you could improve this by being more specific (at a minimum maybe \.PDF to force .PDF)

You can be more imaginative and use multiple elements and negatives - use a regex tester like regexpal online to verify what you are writing as it is much quicker than testing in UAG!

Where do you need regex? The places I most use it is in Appwrap/SecureRemote
 - in the search tags (add mode="regex" to the search tag)
and more importantly
 - in the pages to parse. 

In another post I talked about the 10Mb limit in Appwrap and I needed this to exclude binary objects.

Binaries/Pages over 10Mb fail when using Appwrap

I have found that a number of sites I have fail when the page returned is over 10Mb.  I have had this when data objects in the page are large (eg an embedded table linked to a query that shows way too much in my opinion), reports loaded in a java applet or a PDF - it does not seem to matter.  What I have found is that when you use Appwrap and search/replace objects in all pages (your regex for pages is .*) the initiation of the appwrap causes an error to be returned to users in place of the large data object.

The reason for the 10Mb is the max page size defined in UAG - this limits the size of a single html page to 10Mb (which is huge generally) and this can be changed - but changing it has significant implications for CPU and memory usage as every page (as far as I understand) claims 10Mb of memory in UAG when parsed.

However, UAG for some reason inspects even binary data, and that means for every object the 10Mb limit is imposed.

The answer is to either be more specific and state only the pages you want to parse (eg .*\.jsp.*) or you can do 'not' regexes to say 'not PDF' or the like.  I am doing a seperate page for this, look at my next post, as it is complicated and is useful I think in it's own right.