This page contains helpful information about fcd. As fcd is still very young, its interfaces are likely to change in the future.

Currently, using fcd isn’t too hard, but its most advanced features may require some source digging.

A Word of Warning

Fcd has not been tested on malware. While good coding practices are generally being followed with respect to memory management, it has not been ruled out that fcd or one of its dependencies could be vulnerable to attacks encoded in a malicious executable. Use with caution on programs that you do not trust.

Additionally, even without any foul play, it's possible that fcd will crash, hang, or try to eat all of your system's memory while running on some input. A lot of work has gone into stability, but some more or less common patterns still currently cause fcd to react ungracefully:

  • jump tables;
  • (some) unbreakable loops;
  • `if`s with complex conditions joined by short-circuiting ORs;
  • unexpected uses of the address of stack-allocated objects.

Finally, while fcd does generally good, it's a good idea to keep an eye open for obvious patterns of misdecompilation. If a function's body is a single `llvm.trap` statement, chances are that something went wrong.

Installing fcd

Fcd currently has no binary distribution and must be installed from source. It is not known to build on Windows, though it should build if Clang is available there. Build instructions are located in the file in the source.

Using fcd

Fcd uses LLVM’s Command Line interface instead of getopt and friends. This means that options are generally agnostic to whether you use -o, --o, -option or --option; -f foo, -f=foo, --f foo or --f=foo, etc. By convention, this document uses a single dash for one-letter options and two dashes for so-called “long” options.

As outlined by fcd --help, the general usage is fcd [options] <input>. The command also provides a good summary of the options presented here.

Currently, fcd is not particularly helpful on programs that don’t have symbols if you can’t specify entry points yourself. This is because ELF executables tend to call __libc_start_main from their entry point with the address of the main function, and fcd isn’t smart enough yet to follow function pointers. If there’s no symbol for the main function, fcd will probably miss it. (It can still be specified separately as an entry point if you happen to know its address; see more below.)

Cheat Sheet

  • Decompile a program:
    $ fcd program
  • Decompile a program with custom header files:
    $ fcd --header stdio.h program
  • Decompile a program using a custom header search path:
    $ fcd -I ./include -I ./include/x86_64-linux-gnu --header stdio.h program
  • Decompile a program using a custom executable parser:
    $ fcd -f scripts/ macho-program

Supported executable types and architectures

Currently, fcd supports ELF executables and the x86_64 architecture. While programs written with the x86 architecture will probably load too, this scenario is currently not as much of a priority and output is expected to be inferior. For best results, the executable should use the System V x86_64 calling convention.

In addition to ELF executables, fcd has a “flat binary” format. If you have a binary in a format that is not supported (for instance, PE or Mach-O), you can load it as a flat binary to a specified virtual address. This is often sufficient for small and simple programs. The main downsides are that:

  • you need to specify a load offset with --flat-org;
  • imported symbol names cannot be guessed;
  • the calling convention is not guessed;
  • the entry point(s) are not guessed;
  • you’re screwed if the program has multiple, non-contiguous executable segments.

Finally, fcd lets you use a Python script that knows how to parse an executable. This script needs to implement the following top-level members:

  • an init(data) function, where data is a byte string containing the executable’s data. The function is called before any other member of the module is used;
  • an executableType variable that contains an arbitrary string identifying the type of the executable;
  • an entryPoints global variable, typed as a list of (virtualAddress, name) tuples;
  • a getStubTarget(jumpTarget) method that accepts the memory location that an import stub function jumps to, and returns a (library name?, import name) tuple (where the library name can be None if it is unknown, which is the case in executable formats that don’t support two-level namespacing, like ELF);
  • a mapAddress(virtualAddress) function that accepts a virtual address and returns the offset in init’s data parameter that contains the information at this address.
  • --format/-f: specifies the executable format. Currently supported values are:
    • auto (default): picks ELF if file starts with ELF magic, flat binary otherwise;
    • elf: forces ELF, does its best when the ELF format isn’t respected;
    • flat: flat binary, does not attempt to parse executable at all;
    • path/to/ use a Python script to parse executable.
  • --flat-org: specifies the origin (virtual address) of the program when it is loaded as a flat binary. For instance, on Linux, this will often be 0x00400000.
  • --cc: specifies the default calling convention for functions. This is meant to form some kind of responder chain, eventually. Currently supported values are:
    • auto: autodetect. Asks each calling convention if they recognize the program and takes the first one that matches.
    • any/any: do best effort at figuring out parameters and return values using interprocedural analysis. This problem is fundamentally uncomputable, so results may vary.
    • any/interactive: ask for every function. Requires an underlying system calling convention.
    • x86_64/sysv: System V x86_64 calling convention, used on Linux and Mac OS X (for the x86_64 architecture). This is a so-called system calling convention (and the only one currently implemented).

Loading header files

Header files hint fcd about the parameters, return types and special attributes of functions. If a program uses externel libraries and you have the headers for it, using headers will systematically increase the quality of fcd’s output.

Headers are loaded using the Clang parser.

Headers can be specified with the following options:

  • -I: adds a directory to the header search path. Directories specified with the -I option are searched before system headers.
  • --header: specifies the name of a header to include. For instance, --header file.h is the same as #include "file.h". (As a reminder, quoted includes will search user paths first and system paths second.)

Under the Clang API, front-ends are responsible for setting up the include path. To provide “reasonable defaults”, fcd has a build script that extract Clang’s header search path and bakes it into the executable. Therefore, if you don’t specify any -I parameter, fcd will still look for headers in your Clang installation’s default search path.

If you know about a function in the executable, you can include its signature in the header file as well. Fcd will assume that the function in the executable has the same signature as the one you use in the header file. The prototype must be followed with the FCD_ADDRESS(address) pseudo-attribute, where address is a numeric literal that represents the function’s virtual address in the executable. For instance, assuming a program where main lives at 0x040045e, you could pass a header that contains the following to fcd:

int main(int argc, const char** argv) FCD_ADDRESS(0x040045e);

Functions annotated with FCD_ADDRESS are assumed to be entry points and will be decompiled by fcd (unless you are running in partial mode).

Entry points and level of decompilation

Fcd still being somewhat slow, it might not always be worth it to decompile the whole program you’re interested with. For this reason, it is possible to ask for partial (or exclusive) disassembly to limit how much work fcd tries to do. When doing so, it is necessary to specify the virtual address of the functions that need to be decompiled.

  • --other-entry/-e: specify the virtual address of a function to decompile. Can be used multiple times.
  • --partial/-p: partial decompilation. Produce output only for the functions specified by --other-entry values and their call graph. Use --partial twice to only decompile the functions specified by --other-entry and not their call graph.
  • --module-out/-n: stop after transforming the executable into a LLVM module, and dump that module to stdout. Mostly useful to experiment with passes when you don’t want to spend most of your time waiting on the translation process.
  • --module-in/-m: the <input program> parameter is the path to a LLVM module previously saved with --module-out. Users of this option need to specify a calling convention, since it is normally guessed from the executable file.
  • --opt/-O: insert a specific optimization pass in the middle of the pass pipeline. The optimization pass must either be the name of a pass included in the linked LLVM installation or a path to a .py file implementing a pass.

Using custom passes

Fcd can load Python scripts as optimization passes for custom jobs. The script must supply either a runOnModule global function or a runOnFunction global function (but not both). It may also specify a passName global variable for debugging convenience.

The Python bindings that fcd use are tailored from the LLVM C API in a very mechanical way. These bindings are subject to change: firstly because LLVM’s API tends to change between releases and the plan is to stay up-to-date with stable LLVM releases, and secondly because absolutely no intelligent design has gone into these bindings beyond the automatic translation of header files yet. These bindings merely attach methods on types based on the name of the function and the type of the first parameter: for instance, LLVMGetFirstBasicBlock(LLVMValueRef) is translated as a GetFirstBasicBlock method on the Value Python type. This isn’t so bad, but it can get a little confusing with the IsA* methods: LLVMIsAConstantExpr(LLVMValueRef) creates a IsAConstantExpr method on Value, which returns a handle to a constant expression if the Value object was a ConstantExpr. This is just one thing on the long laundry list of things to do for fcd in the future.

To explore the API, you are encouraged to familiarize yourself with the LLVM C API. Another simple thing to do could be to drop into a Python REPL from runOnFunction to call dir and help on everything to see where the pieces fall.