Code Generation Using T4

I dislike code generation and usually, I see it as a "smell". If you are using code generation of any kind, there is a good chance something is wrong with your design or solution! So perhaps instead of writing a script to generate thousands lines of code, you should take a step back, think about your problem again and come up with a better solution. With that said, there are situations where code generation might be a good solution.

In this post, I will talk about pros and cons of code generation and then show you how to use T4 templates, the built-in code generation tool in Visual Studio, using an example.

Code Generation Is a Bad Idea

I am writing a post about a concept that I think is a bad idea, more often than not and it would be unprofessional of me if I handed you a tool and didn't warn you of its dangers.

The truth is, code generation is quite exciting: you write a few lines of code and you get a lot more of it in return that you would perhaps have to write manually. So it's easy to fall into a one-size-fits-all trap with it:

"If the only tool you have is a hammer, you tend to see every problem as a nail"". A. Maslow

But code generation is almost always a bad idea. I refer you to this post, that explains most of the issues that I see with code generation. In a nutshell, code generation results into inflexible and hard to maintain code.

Here are a few examples of where you should not use code generation:

With code generated distributed architecture you run a script that generates the service contracts and the implementations and magically turns your application into a distributed architecture. That obviously fails to acknowledge the excessive chattiness of in-process calls that dramatically slows down over network and the need for proper exception and transaction handling of distributed systems and so on.
Visual GUI designers is what Microsoft developers have used for ages (in Windows/Web Forms and to some extent, XAML based applications) where they drag and drop widgets and UI elements and see the (ugly) UI code generated for them behind the scenes.
Naked Objects is an approach to software development where you define your domain model and the rest of your application, including the UI and the database, all gets generated for you. Conceptually, it's very close to Model Driven Architecture.
Model Driven Architecture is an approach to software development where you specify your domain in details using a Platform Independence Model (PIM). Using code generation, PIM is later turned into a Platform Specific Model (PSM), that a computer can run. One of the main selling points of MDA, is that you specify the PIM once and can generate web or desktop applications in a variety of programming languages just by pushing a button that can generate the desired PSM code.

A lot of RAD (Rapid Application Development) tools are created based on this idea: you draw a model and click a button to get a complete application. Some of these tools go as far as trying to completely remove developers from the equation where non-technical users are thought to be able to make safe changes to the software without the need for a developer.

I was also going to put Object Relational Mapping in the list as some ORMs heavily rely on code generation to create the persistence model from a conceptual or physical data model. I have used some of these tools and have undergone a fair bit of pain to customize the generated code. With that said, a lot of developers seem to really like them, so I just left that out (or did I?!) ;)

While some of these "tools" do solve some of the programming problems and reduce the required upfront effort and cost of software development, there is a huge hidden maintainability cost in using code generation, that sooner or later is going to bite you and the more generated code you have, the more that it is going to hurt.

I know that a lot of developers are huge fans of code generation and write a new code generation script every day. If you are in that camp and think it is a great tool for a lot of problems, I am not going to argue with you. After all, this post is not about proving code generation is a bad idea.

Sometimes, Only Sometimes, Code Generation Might Be a Good Idea

Very rarely though, I find myself in a situation where code generation is a good fit for the problem at hand and the alternative solutions would either be harder or uglier.

Here is a few examples of where code generation might be a good fit:

You need to write a lot of boilerplate code that follows a similar static pattern. Before trying code generation, in this case, you should think really hard about the problem and try writing this code properly (for example, using object oriented patterns if you're writing OO code). If you have tried hard and haven't found a good solution, then code generation might be a good choice.
You very frequently use some static metadata from a resource and retrieving the data requires using magic strings (and perhaps is a costly operation). Here are a few examples:
- Code metadata fetched by reflection: calling code using reflection requires magic strings; but at design time you know what you need you can use code generation to generate the required artifacts. This way you will avoid using reflections at run time and/or magic strings in your code. A great example of this concept is T4MVC that creates strongly typed helpers that eliminate the use of literal strings in many places.
- Static lookup web services: every now and then I come across web services that only provide static data that can be fetched by providing a key, which ends up as a magic string in the codebase. In this case, if you can programmatically retrieve all the keys, then you can code generate a static class containing all the keys and access the string values as strongly typed first class citizens in your codebase instead of using magic strings. You could obviously create the class manually; but you would also have to maintain it, manually, every time the data changes. You can then use this class to hit the web service and cache the result so the subsequent calls are resolved from the memory.
  
  Alternatively, if allowed, you could just generate the entire service in code so the lookup service is not required at runtime. Both solutions have some pros and cons so pick the one that fits your requirements. The latter is only useful if the keys are only used by the application and are not provided by the user; otherwise sooner or later there will be a time when the service data has been updated but you haven't generated the code, and the user initiated lookup fails.
- Static lookup tables: This is very similar to static web services but the data lives in a data store as opposed to a web service.

As mentioned above, code generation makes for inflexible and hard to maintain code; so if the nature of the problem you're solving is static and doesn't require frequent maintenance, then code generation might be a good solution!

Just because your problem fits into one of the above categories doesn't mean code generation is a good fit for it. You should still try to evaluate alternative solutions and weigh your options.

Also, if you go for code generation, make sure to still write unit tests. For some reason, some developers think that generated code doesn't require unit testing. Perhaps they think it's generated by computers and computers don't make mistakes! I think generated code requires just as much (if not more) automated verification. I personally TDD my code generation: I write the tests first, run them to see them fail, then generate the code and see the tests pass.

Text Template Transformation Toolkit

There is an awesome code generation engine in Visual Studio called Text Template Transformation Toolkit (AKA, T4).

From MSDN:

Text templates are composed of the following parts:

Directives: elements that control how the template is processed.
Text blocks: content that is copied directly to the output.
Control blocks: program code that inserts variable values into the text and controls conditional or repeated parts of the text.

Instead of talking about how T4 works, I would like to use a real example. So here is a problem I faced a while back for which I used T4. I have an open source .NET library called Humanizer. One of the things I wanted to provide in Humanizer was a fluent developer friendly API for working with DateTime.

I considered quite a few variations of the API and at the end, settled for this:

In.January          // Returns 1st of January of the current year
In.FebruaryOf(2009) // Returns 1st of February of 2009

On.January.The4th   // Returns 4th of January of the current year
On.February.The(12) // Returns 12th of Feb of the current year

In.One.Second       // DateTime.UtcNow.AddSeconds(1);
In.Two.Minutes      // With corresponding From method
In.Three.Hours      // With corresponding From method
In.Five.Days        // With corresponding From method
In.Six.Weeks        // With corresponding From method
In.Seven.Months     // With corresponding From method
In.Eight.Years      // With corresponding From method
In.Two.SecondsFrom(DateTime dateTime)

After I knew what my API was going to look like I thought about a few different ways to tackle this and spiked a few object oriented solutions, but all of them required a fair bit of boilerplate code and those that didn't, wouldn't give me the clean public API that I wanted. So I decided to go with code generation.

For each variation I created a separate T4 file:

In.Months.tt for In.January and In.FebrurayOf(<some year>) and so on.
On.Days.tt for On.January.The4th, On.February.The(12) and so on.
In.SomeTimeFrom.tt for In.One.Second, In.TwoSecondsFrom(<date time>), In.Three.Minutes and so on.

Here I will discuss On.Days. The code is copied here for your reference:

<#@ template debug="true" hostSpecific="true" #>
    <#@ output extension=".cs" #>
	<#@ Assembly Name="System.Core" #>
	<#@ Assembly Name="System.Windows.Forms" #>
	<#@ assembly name="$(SolutionDir)Humanizer\bin\Debug\Humanizer.dll" #>
	<#@ import namespace="System" #>
	<#@ import namespace="Humanizer" #>
	<#@ import namespace="System.IO" #>
	<#@ import namespace="System.Diagnostics" #>
	<#@ import namespace="System.Linq" #>
	<#@ import namespace="System.Collections" #>
	<#@ import namespace="System.Collections.Generic" #> 
	using System;
	
	namespace Humanizer
	{
	    public partial class On
	    {
		<#    
		const int leapYear = 2012;
	    for (int month = 1; month <= 12; month++)
	    {
			var firstDayOfMonth = new DateTime(leapYear, month, 1);
			var monthName = firstDayOfMonth.ToString("MMMM");#>
		    
	        /// <summary>
			/// Provides fluent date accessors for <#= monthName #>
	        /// </summary>
			public class <#= monthName #>
			{
		        /// <summary>
				/// The nth day of <#= monthName #> of the current year
		        /// </summary>
				public static DateTime The(int dayNumber)
				{
					return new DateTime(DateTime.Now.Year, <#= month #>, dayNumber); 
				}
	        <#for (int day = 1; day <= DateTime.DaysInMonth(leapYear, month); day++)
	        {
			var ordinalDay = day.Ordinalize();#>
	 
		        /// <summary>
				/// The <#= ordinalDay #> day of <#= monthName #> of the current year
		        /// </summary>
				public static DateTime The<#= ordinalDay #>
				{
					get { return new DateTime(DateTime.Now.Year, <#= month #>, <#= day #>); }
				}
		    <#}#>
	         }
	    <#}#>
		}
	}

If you're checking this code out in Visual Studio or want to work with T4, make sure you have installed the Tangible T4 Editor for Visual Studio. It provides IntelliSense, T4 Syntax-Highlighting, Advanced T4 Debugger and T4 Transform on Build.

The code might seem a bit scary in the beginning, but it's just a script very similar to the ASP language. Upon saving, this will generate a class called On with 12 subclasses, one per month (for example, January, February etc) each with public static properties that return a specific day in that month. Let's break the code apart and see how it works.

Directives

The syntax of directives is as follows: <#@ DirectiveName [AttributeName = "AttributeValue"] ... #>. You can read more about directives here.

I have used the following directives in the code:

Template

1	<#@ template debug="true" hostSpecific="true" #>

The Template directive has several attributes that allow you to specify different aspects of the transformation.

If the debug attribute is true, the intermediate code file will contain information that enables the debugger to identify more accurately the position in your template where a break or exception occurred. I always leave this as true.

Output

1	<#@ output extension=".cs" #>

The Output directive is used to define the file name extension and encoding of the transformed file. Here we set the extension to .cs which means the generated file will be in C# and the file name will be On.Days.cs.

Assembly

1	<#@ assembly Name="System.Core" #>

Here we are loading System.Core so we can use it in the code blocks further down.

The Assembly directive loads an assembly so that your template code can use its types. The effect is similar to adding an assembly reference in a Visual Studio project.

This means that you can take full advantage of the .NET framework in your T4 template. For example, you can use ADO.NET to hit a database, read some data from a table and use that for code generation.

Further down, I have the following line:

1	<#@ assembly name="$(SolutionDir)Humanizer\bin\Debug\Humanizer.dll" #>

This is a bit interesting. In the On.Days.tt template I am using the Ordinalize method from Humanizer which turns a number into an ordinal string, used to denote the position in an ordered sequence such as 1st, 2nd, 3rd, 4th. This is used to generate The1st, The2nd and so on.

From the MSDN article:

The assembly name should be one of the following:

The strong name of an assembly in the GAC, such as System.Xml.dll. You can also use the long form, such as name="System.Xml, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089". For more information, see AssemblyName.
The absolute path of the assembly.

System.Core lives in GAC, so we could just easily use its name; but for Humanizer we have to provide the absolute path. Obviously I don't want to hardcode my local path, so I used $(SolutionDir) which is replaced by the path the solution lives in during code generation. This way the code generation works fine for everyone, regardless of where they keep the code.

Import

1	<#@ import namespace="System" #>

The import directive allows you to refer to elements in another namespace without providing a fully-qualified name. It is the equivalent of the using statement in C# or imports in Visual Basic.

On the top we are defining all the namespaces we need in the code blocks. The import blocks you see there are mostly inserted by T4 Tangible. The only thing I added was:

1	<#@ import namespace="Humanizer" #>

So I can later write:

1	var ordinalDay = day.Ordinalize();

Without the import statement and specifying the assembly by path, instead of a C# file, I would have gotten a compile error complaining about not finding the Ordinalize method on integer.

Text Blocks

A text block inserts text directly into the output file. On the top, I have written a few lines of C# code which get directly copied into the generated file:

1	using System; namespace Humanizer { public partial class On {

Further down, in between control blocks, I have some other text blocks for API documentation, methods and also for closing brackets.

Control Blocks

Control blocks are sections of program code that are used to transform the templates. The default language is C#.

Note: The language in which you write the code in the control blocks is unrelated to the language of the text that is generated.

There are three different types of control blocks: Standard, Expression and Class Feature.

From MSDN:

<# Standard control blocks #> can contain statements.
<#= Expression control blocks #> can contain expressions.
<#+ Class feature control blocks #> can contain methods, fields and properties.

Let's take a look at the controls blocks that we have in the sample template:

<#    
    const int leapYear = 2012;
    for (int month = 1; month <= 12; month++)
    {
		var firstDayOfMonth = new DateTime(leapYear, month, 1);
		var monthName = firstDayOfMonth.ToString("MMMM");#>
	    
        /// <summary>
		/// Provides fluent date accessors for <#= monthName #>
        /// </summary>
		public class <#= monthName #>
		{
	        /// <summary>
			/// The nth day of <#= monthName #> of the current year
	        /// </summary>
			public static DateTime The(int dayNumber)
			{
				return new DateTime(DateTime.Now.Year, <#= month #>, dayNumber); 
			}
        <#for (int day = 1; day <= DateTime.DaysInMonth(leapYear, month); day++)
        {
		var ordinalDay = day.Ordinalize();#>
 
	        /// <summary>
			/// The <#= ordinalDay #> day of <#= monthName #> of the current year
	        /// </summary>
			public static DateTime The<#= ordinalDay #>
			{
				get { return new DateTime(DateTime.Now.Year, <#= month #>, <#= day #>); }
			}
	    <#}#>
	}
    <#}#>

For me personally, the most confusing thing about T4 is the opening and closing control blocks, as they kinda get mixed with the brackets in the text block (if you're generating code for a curly bracket language like C#). I find the easiest way to deal with this, is to close (#>) the control block as soon as I open (<#) it and then write the code inside.

On the top, inside the standard control block, I am defining leapYear as a constant value. This is so I can generate an entry for February 29th. Then I iterate over 12 months for each month getting the firstDayOfMonth and the monthName. I then close the control block to write a text block for the month class and its XML documentation. The monthName is used as a class name and in XML comments (using expression control blocks). The rest is just normal C# code which I am not going to bore you with.

Conclusion

In this post I talked about code generation, provided a few examples of when code generation could be either dangerous or useful and also showed how you can use T4 templates to generate code from Visual Studio using a real example.

If you would like to learn more about T4, you can find a lot of great content on Oleg Sych's blog.