System.Web.Routing URL matching and generation

Recently .NET got a new feature - System.Web.Routing - which is a general purpose routing engine used at this point (within the realms of .NET class library) by System.Web.Mvc and System.Web.DynamicData namespaces. Most of System.Web.Routing is pretty straight-forward and easy to understand, but there's one pretty poorly documented feature (albeit a very important one, not to say crucial) - the route URLs.

During the past week was fixing several bugs in the Mono implementation of System.Web.Routing engine, all of which were related to route URL parsing and virtual path generation. At the beginning of the week I thought the task would be quick and easy, but it turned out that documentation of some aspects of the subject is few and far between and, in effect, correct implementation of the support took me the whole week. Since the feature is important for both the implementor of the namespace as well as for the end user, this blog entry attempts to collect and explain all the details of the subject.

URLs are used for two purposes in System.Web.Routing - for checking if a request matches a route (in this instance the route URL is considered to be specified using a simple pattern DSL) and for generating virtual paths (in this instance the route URL is considered to be a replacement pattern for the virtual path).

Note: all of the information below has been acquired through tests and whatever little documentation is available on the subject, I saw no source code for .NET's version (we never do that at Mono, as you probably know) so there might be errors and mistakes in the description. If you find one, please let me know (or better yet - create a simple test case which demonstrates the failure and file a bug for Mono's Sys.Web component. If you're interested in the tests we have, feel free to take a look

From the two, route matching is better documented, but let me outline the rules which the matching URLs follow:

  • URL must not start with either ~ or / characters
  • URL must not contain the ? character.
  • Each URL is built from zero or more segments separated with the / character
  • Each segment may contain any number of sections enclosed in the { and } characters.
  • Within one segment, no two sections can be adjacent, they must be separated with literal string of characters.
  • No section name must be empty. Empty in this case is understood as "not containing any characters" - you can have a section name consisting only of whitespace (including tabs and newlines).
  • Sections are parsed using hungry matching - that is, they will consume all the text from their starting point all the way to the start of the last ocurrence of the following literal or to the end of input, whichever comes first.
  • A segment may contain a catch-all section which is marked by putting * as the first character of the section name. Catch-all sections must come alone within the segment and they are valid only in the last segment of the URL

This is a pretty simple set of rules and is easy to implement (see PatternParser.cs in Mono repository)

 

Let's pass to the other part of the subject, virtual path generation. The theory seems simple here - take the original route URL and replace it with provided values. The simple recipe has embedded traps, however. Once you discover the rules, they're easy and the algorithm is clear, but discovering them is, as it turns out, not so easy today. Thanks to Atushi Enomoto's work on implementing our System.Web.Routing there already was basic support for the parsing/matching/generating operations which covered the most common cases. It didn't work correctly for the more obscure and less frequently encountered scenarios. There was very little meaningful information on the web I could find on those and, by far, the most valuable piece of information came from the "ASP.NET MVC 1.0" book (Chapter 4). All gathered together, however, it still didn't cover everything. What you can see below is a conglomerate of the above information with knowledge learned from tests.

There are several terms you need to be familiar with in order to understand what's going on:

Required values
Values which are found in the route URL (i.e. sections) which don't have defaults provided
Default values
Values provided by the programmer in the route's Default dictionary.
Ambient values
Values gathered by the routing engine from the current request. They are accessible via RequestContext.RouteData.Values
Constraints
Programmer can provide the routing engine with a set of per-route constraints. Constraints may be presented in two flavors - either strings, which are treated as regular expressions, or instances of the IRouteConstraint interface. Those values are consulted when generating the virtual path.
Overflow values
Those are the values provided explicitly in the call to Route.GetVirtualPath that are not in the route URL and also are not specified in the Defaults and Constraints dictionaries attached to the Route class. Such values are appended as query parameters to the generated virtual path.

Given the information above, the rules for virtual path generation are outlined below. This is where it becomes a bit entangled.

  • If there are any required parameters in the current route, a check is made whether all of them are present in the dictionary passed by user to the GetVirtualPath method.
    • If any are missing, there are several rules before we decide it's a mismatch:
      • If there are no defaults
      • or the defaults dictionary contains at least one value used in the route URL
      • and the user values do not contain any value used in the route URL,
      • then the code is allowed to check for the required value in the ambient values. If the value is not found there, we have a mismatch.
      • If the above four conditions are not met and the value is not found in the user values, we have a mismatch
    • If all of them are present, checks are performed to see whether
      • there are any default values which are not used in the route URL.
      • If any such entries are found, another check is made to make sure that any corresponding values found in the user-passed dictionary have exactly the same value as specified in the defaults. If the check fails, we have a mismatch.
      • Further on, the code looks whether there are any constraints defined for the route and, if yes, iterates over the dictionary to make sure that all of the constraints are met.
      • If there are no errors, we have a match and we can start generating the virtual path.

This completes the preliminary checks whether the route qualifies for virtual path generation in the context of the current request.

 

The actual virtual path code generation can begin now. Code generating the virtual path can trim the result if by skipping continuous block of values at the end of the URL if all of the values are defaults. After this process is done, code looks for the overflow values and appends them to the resulting virtual path as query parameters. Both names and values in the query are encoded using Uri.EscapeDataString.

This concludes the virtual path generation process. Hope somebody finds this description useful :)

Book meme, per instructions :)

Following a fellow hackerette's instructions, here's my meme:

"Computer science has a subdiscipline called Information Retrieval 
(IR for short) that focuses almost entirely on this problem."

And the original Andreia's instructions copied here:

  • Grab the nearest book.
  • Open it to page 56.
  • Find the fifth sentence.
  • Post the text of the sentence in your journal along with these instructions.
  • Don't dig for your favorite book, the cool book, or the intellectual one: pick the CLOSEST.

So, Andreia, what's the page number for next week? :)

Tip: Mono ASP.NET application burning CPU in idle state - FileSystemWatcher

Update: added sample code to detect the watcher in use, courtesy of Robert Jordan - thanks!

mod_mono is an Apache module for hosting ASP.NET applications. The module itself doesn't run any .NET code, instead it spawns a backend server (mod-mono-server.exe for ASP.NET 1.1 and mod-mono-server2.exe for ASP.NET 2.0) which is handed all the requests coming in from the client browser and sends back response generated by the application.

If you run Mono on a VPS server (e.g. Xen, OpenVZ) then you don't usually have any control over what Linux kernel version and with what capabilities you run. It may happen that the kernel lacks capabilities used by parts of the Mono runtime and Mono will have to fall back to other methods of doing the same task. One such part is the FileSystemWatcher class which is used to monitor changes to files/directories on disk so that the application can take any steps it deems necessary in reaction to file creation/deletion/modification events.

Mono's FileSystemWatcher does its best to perform its assigned task in various environments, under various operating systems. Part of the effort is selecting the actual filesystem monitoring backend best for the runtime environment. Under Unix the supported backends are as follow:

  • FAM
  • kevent (BSD*/MacOSX only)
  • gamin
  • inotify (Linux only)
  • Managed watcher
Out of those, assuming you run Linux, inotify is the preferred backend mechanism as it requires no polling effort on userland application part, instead the Linux kernel will notify the application (in our case the Mono runtime) whenever interesting events happen. However, it requires the Linux kernel to support the mechanism and, what's more important, for your VPS operator to actually include the support in the kernel your VPS runs on.

If your kernel doesn't support inotify, Mono will attempt to use FAM and Gamin which are userland daemons doing active filesystem polling but outside of the consumer application. The consumer application will use provided FAM/Gamin libraries to receive events and react to them. Performance of this setup is worse than inotify but not tragic.

Should Mono fail to detect inotify, FAM or Gamin support, it will fall back to the last resort option - the managed watcher. This watcher is implemented in managed code and uses a separate thread for filesystem monitoring, polling for changes on selected files/directories. As the application may (and in the case of ASP.NET sometimes does) watch directories recursively, it might be a very expensive situation requiring checking changes to a big set of files. Each change detection run requires checking whether a file/directory exists (in case of the Managed watcher those are two stat (2) calls) and then checking the file metadata for changes (size, modification times etc) and, possibly, generating an event. This happens approximately every 750ms and can generate substantial load on the server's CPU.

If you notice (using top or htop applications) that your copy of mod-mono-server burns several per-cent of CPU but is otherwise in the S (Sleeping) process state, chances are your application is using the managed watcher. You can confirm that by using htop which allows you to watch individual process threads - you will see two threads consuming nearly the same amount of CPU time and one of them waking up every ~750ms.

The cure for the itch is easy, if you can live without filesystem monitoring (that means your application will not auto-restart when you modify Web.config, files won't be recompiled if you modify a code-behind .cs or an .aspx, .ascx etc. files). Mono supports a MONO_MANAGED_WATCHER environment variable which can be set to value disable with the effect of definitely disabling filesystem monitoring (it will use a "dumb" implementation of the watcher backend which does nothing) and relieve your application of the filesystem polling chores described above.

You can set the environment variable for your Apache VirtualHost by using the following mod_mono directive:

MonoSetEnv [server_alias] MONO_MANAGED_WATCHER=disable

Sample program to detect which watcher backend is used:

using System;
using System.Reflection;
using System.IO;

class Program {

	public static void Main()
	{
		object watcher = new FileSystemWatcher()
			.GetType ()
			.GetField ("watcher", BindingFlags.NonPublic | BindingFlags.Static)
			.GetValue (null);
		
		Console.WriteLine (watcher != null
				   ? watcher.GetType ().FullName
				   : "unknown");
	}
}

Mono + Linux + BlogEngine.NET

As you can see at the bottom of the page, this site is powered by the BlogEngine.NET open-source blogging software but, yes, it is running on Linux with Mono and Apache.

There have been just two issues with case-insensitivity in BlogEngine.NET source code, but otherwise the deployment went without any issues what-so-ever! This is a live proof for both Mono and Mono's ASP.NET maturity as well as the .NET's realized promise, thanks to Mono again, of enabling one to write and deliver cross-platform software.

Back to blogging

So, it has been over a year since I last blogged. A lot has happened in the 15 months in Mono - we have released version 2.0 and continued new releases all the way to the latest 2.4 which came with lots of performance and stability improvements. ASP.NET in Mono now has support for almost all .NET 3.5 controls (except for the LinqDataSource which is not fully implemented, but it's on its way), we have System.Web.Routing, beginnings of System.Web.DynamicData (more on that in some later post) and integrated System.Web.Mvc after Microsoft released it under an opensource license. But you are all well aware of those events, so there's no point in continuing the list here.

As I feared, I wasn't the most active blogger, for various reasons. This time, though, I do hope to keep you updated on what's going on in ASP.NET in Mono and, perhaps, about other things I find interesting. So, check back from time to time and maybe I'll manage to put together a few words which make sense :)