Decompiling .NET AssembliesBy Scott Mitchell
If you've created any ASP.NET Web applications before, and used Visual Studio .NET, then you're well aware that the entire set of ASP.NET code-behind classes specific to that application are compiled into a single DLL file, which resides in the Web application's
/bindirectory. This DLL file is referred to as an assembly. An assembly is the set of files that comprise an entire .NET application and, in the typical ASP.NET application example, can consist of one single file, or, in more complex situations, can consist of a number of files. The two germane parts of an assembly include:
- MSIL Code - MSIL, or Microsoft Intermediate Language, is the intermediate language all .NET applications are compiled down to. That is, when a .NET application is compiled, the high-level code you wrote code in C# or Visual Basic .NET is compiled into the intermediate language MSIL. This MSIL is executed by the Common Language Runtime (CLR) when the program is executed.
- Metadata - in addition to the actual application's source code (MSIL), the assembly also contains extra bits of information about versioning, security, deployment, and so on.
When you compile a .NET application, the source code is translated into MSIL in a fairly straightforward manner. That is, there is a rough symmetry between a line of C# code, and the resulting MSIL. Since there is a rough mapping between high-level code and MSIL, it is possible to take an assembly's MSIL and convert it back to equivalent C# or VB.NET code. In fact, there are free programs that exist that do this very thing!
In this article we are going to examine one of these programs, Reflector. Using Reflector, you can examine the C# or VB.NET source code for ASP.NET applications, WinForms applications, and .NET class libraries. For example, using Reflector you can view the source code of the .NET Framework base class library, such as the classes that make up the built-in ASP.NET Web controls.
The Process of Decompilation - The Hard Way
To understand how we can decompile a .NET assembly into high-level source code, it is first important to understand how your high-level source code gets compiled into a .NET assembly. When creating an ASP.NET page's code-behind class in Visual Studio .NET, you are writing a series of instructions in a high-level programming language, such as C# or VB.NET. When you compile your project, the C# or VB.NET compiler parses through your source code and translates it into the MSIL intermediate language, packaging this MSIL for the assorted code-behind classes into a single DLL file.
MSIL is called an intermediate language because it is not a language that any computer architectures can process natively. That is, your computer's CPU cannot parse or process the MSIL code that is in an assembly. A CPU must receive its instructions in a very precise and particular language called machine language, which can be viewed as a string of 1s and 0s that carry information as to what instruction to execute and what data to use for the operation. Machine languages are specific to a computer architecture; for example, the machine language used by an Intel CPU differs from that of a Macintosh CPU.
In order for a computer to execute MSIL instructions, the MSIL instructions must be translated into machine language. This is done by the .NET Framework's Common Language Runtime (CLR). The CLR acts as a translator between the intermediate-level MSIL and the low-level machine code.
Traditionally, the job of a compiler is to translate high-level syntax to highly optimized machine code, specific for a particular architecture. The C# and VB.NET compilers, however, have a much simpler job - they only need to translate high-level syntax into MSIL. There is a pretty straightforward mapping between a line of C# or VB.NET, and the corresponding MSIL. The C# and VB.NET compilers don't bother with implementing optimizations in the MSIL - rather, they just provide a translation from high-level syntax to the .NET intermediate language. Since the MSIL sitting in a .NET assembly is a fairly simple transformation from the high-level syntax that was used to create it, with a bit of knowledge about MSIL we can examine MSIL and translate it back into syntactically equivalent high-level code.
In order to decompile a .NET assembly back into high-level syntax, we need to be able to do two things:
- Extract the MSIL from a .NET assembly, and
- Convert the MSIL back to a high-level syntax
Getting the MSIL out of an assembly is easy enough, as the .NET Framework ships with a utility program called ILDASM, whose sole purpose is extracting MSIL from a given assembly. (A screenshot of ILDASM is shown to the right.) To learn more about using ILDASM, check out: ILDAMS is Your New Best Friend.
Once you have the MSIL for an assembly, converting the MSIL back into high-level syntax can be accomplished manually, assuming that you have a strong understanding of the MSIL syntax and semantics. There are entire books dedicated to MSIL, such as Serge Lidin's Inside Microsoft .NET IL Assembler, that you can read to gain a profound understanding of MSIL and its syntax. Armed with an intimate understanding of MSIL and ILDASM, you should be able to revert any compiled .NET assembly back into a corresponding high-level syntax.
The Process of Decompilation - The Easy Way with Reflector
If you stopped reading at the last paragraph above, bought Inside Microsoft .NET IL Assembler, mastered MSIL, and are now decompiling .NET assemblies using the techniques I mentioned above, I owe you one big apology. Because decompiling MSIL into a high-level syntax is really as easy as pointing and clicking. Due to the rough mapping between high-level syntax and MSIL, a computer program can handle the task of converting MSIL into high-level syntax. There are a couple of free programs out there that act as decompilers, the most used and best known being Reflector. (Reflector was written by Lutz Roeder, an employee at Microsoft. The screenshots of Reflector shown here were taken using Reflector version 188.8.131.52.)
To start using Reflector, first download the ZIP file from Lutz's site.
The ZIP contains two files:
Reflector.exe and a README file. To run Reflector, simply unzip
to a specific folder on your computer, and then double-click. This will launch Reflector, which is preloaded a few assemblies
from the .NET Framework base class library:
You can click on the + to expand any of the assemblies, which will show the classes in the assembly. Click on a class to
view its methods and properties. At this point, Reflector appears to be nothing but another class viewer. (Visual Studio .NET
includes a class viewer called Object Browser (open it from the View menu), and the .NET Framework ships with a class viewer
WinCV.exe.) The screenshot
below shows Reflector being used as a class viewer, showing the methods of the DataGrid Web control.
Reflector's real use shines through in its decompiling capabilities. In the Toolbar at the top, you can select what language to decompile to:
- Visual Basic
To view the high-level syntax for a particular method, start by choosing what language you want the MSIL decompiled to. Next,
click on the method whose source code you want to view and then go to the Tools menu and choose Disassembler (or hit space on
the keyboard). The screenshot below shows the decompiled C# source code for the DataGrid's
Decompiling Other Assemblies
As the screenshots above showed, by default Reflector opens with a set of assemblies loaded from the .NET Framework base class library. You can add additional assemblies to Reflector by going to the File menu and choosing Open. This will display a dialog box where you can browse to the assembly to load. For example, if you want to load the assembly for an ASP.NET Web application, simply navigate to that application's
/bindirectory and choose the appropriate DLL file. Once you've opened an assembly, you can browse through its classes and their methods and properties, as well as view the decompiled source code.
One of the nice things about Reflector is its extensibility. Reflector allows for other developers to create Add-Ins, which can then be plugged into Reflector dynamically. There are a number of useful, real-world Add-Ins for Reflector scattered around the Internet. To learn about some of the better known Add-Ins, check out the Reflector Add-Ins Index.
Protecting Your Source Code
Reflector is a wonderful tool for developers because it allows us to peek inside an assembly and view its inner workings. However, if you create .NET applications that you sell, you might not want others to be able to see inside of your assembly. There's no way to stop someone from extracting the MSIL from your assembly, but what you can do is attempt to make the MSIL unreadable. The purposeful process of taking an intermediate language like MSIL and converting it into a semantically identical version, but one that's hard to make sense of by a human, is known as obfuscation. There exist a number of products that you can use to obfuscate your .NET assemblies, such as Dotfuscator, XenoCode, Demeanor for .NET, and many others. Visual Studio .NET 2003 ships with a "Community Version" of Dotfuscator, and the next version of Visual Studio - Visual Studio 2005 - will also ship with such a version. The "Community Version" provides a lower level of obfuscation than the professional version of Dotfuscator.
A more detailed look into code obfuscation can be found in Adnan Masood's article: Intellectual Property Protection and Code Obfuscation.