Creating a Plugin for a Closed Source x86 Application
Introduction I am not a big gamer, but I enjoy creating things and then playing with what I create, so I quickly got attached to Neverwinter Nights 1. This game lets us create persistent worlds, script them and then play on them with other real people. It was released in 2002 and had its last update in 2008. Its source code has since then been locked up somewhere and it will probably stay untouched for the next 10 years: with all the parties involved in its development, it probably wouldn’t be profitable to make the source code public.
As a player and server host of this game, I was eager to receive new fixes and updates, but knew I'd be waiting in vain. It was at that moment that I started looking at how to do what I wanted, despite the fact that I had no source code or even technical knowledge of the engine internals.
For the past five years I have been extending Neverwinter Nights 1 with a plugin and will demonstrate how I managed to create my first fixes and extra features for this game. Using the following example, I will show how i can create a message box that displays whatever is typed in the chat bar. Neverwinter Nights is designed for a 32 bit / x86 processors and it is available on Windows and Linux. My code here will be for this processor family and those 2 operating systems, but the following concepts should be applicable on other OS and, at least, to the x64 processors as well.
Creating the Plugin
The first step is to create a shared library (.dll on Windows or .so on Linux) that will get loaded by the application, in order to have access to the process memory. To be cross platforms, my library is a global object and I use its constructor as the "init" function. It's there that I will start hooking:
struct MyPlugin
{
MyPlugin()
{
//TODO: Hook functions
}
};
MyPlugin my_plugin;
Injecting the Plugin in the Application
The next step is to get the application to load the plugin. On Linux, we just need to add it to the LD_PRELOAD variable before running the application:
LD_PRELOAD=myplugin.so:$LD_PRELOAD
On Windows, we need a dedicated launcher, start the application programmatically start the application and attach the library to it. The attachment can be implemented by calling the CreateRemoteThread function with the LoadLibraryA as the thread entry point:
void LoadLibraryInProcessSync(HANDLE process, const char* library_path)
{
size_t library_path_size = strlen(library_path)+1;
char* injected_library_path = (char*)VirtualAllocEx(process, NULL, library_path_size, MEM_COMMIT, PAGE_READWRITE);
WriteProcessMemory(process, injected_library_path, library_path, library_path_size, NULL);
HANDLE lib_thread = CreateRemoteThread(process, NULL, 0, (LPTHREAD_START_ROUTINE)GetProcAddress(LoadLibraryA("kernel32.dll"), "LoadLibraryA"), injected_library_path, 0, NULL);
WaitForSingleObject(lib_thread, INFINITE);
CloseHandle(lib_thread);
VirtualFreeEx(process, injected_library_path, 0, MEM_RELEASE);
}
Finding the Functions to Hook
This is the fun part: analysing the application assembler code to find out how a specific feature could be implemented or what is causing a crash.
I started doing this by opening the application in OllyDbg, I then scanned the assembler code, looked at the string references, started the debugger and inserted random breakpoint. When one of my breakpoints was hit, I closely examined the surrounding assembler code. After a fruitless day with this approach I gave up and started to think of a more productive tactic.
A few months later, I decided to take another run at this and this time things went better. My first tip is: don't rush into it! Try to collect all the information you can about the internals of the application using the following approaches:
- Try to find a version (e.g. a beta) of it that includes the debugging symbols. If you can find one, is then relatively easy to match addresses, by searching for the same assembly code.
- As above, try to find the source code of all statically linked libraries used by the application so that you may assign a function name to the addresses in your application which then become very useful to understand what the callers are doing.
- Look at the imported symbols.
In the end, what you need is a function that will get called near to where you need to hook.
Even if you can't find any symbols, there must be a least one native function that is used near where you need to hook. For example, if you know that the application needs to read a file, you can hook the fopen/CreateFileA function and you can even hook the malloc/HeapAlloc free/HeapFree functions to trace where a specific string is getting allocated and freed. In my case Neverwinter Nights is a client/server application, so I know that it will send the chat message via a socket so I will therefore hook onto the sendto function.
With this hook, you can programmatically check if the conditions are met (in my case, if the packet sent contains "!msgbox", which is what I will type in the chat bar) and give you a chance to break at that moment, by using a MessageBox for example:
int __stdcall sendto_hook(SOCKET s, const char* buf, int len, int flags, const struct sockaddr* to, int tolen)
{
for (int i=0; i<=len-7; i++)
{
if (memcmp(buf+i, "!msgbox", 7) == 0)
{
MessageBoxA(0, "sendto(!msgbox) found!", "attach the debugger", 0);
}
}
return sendto(s, buf, len, flags, to, tolen);
}
hook_api(GetProcAddress(GetModuleHandleA("Ws2_32.dll"), "sendto"), sendto_hook);
The tools: IDA
Finally, it's time to open a disassembler and a debugger to start investigating what is around that message box. The best tool for this job is IDA Pro: it'ill help you to easily understand the assembler code like by displaying the functions and their parameters. It has a lot of features but the most important one is the database because it lets you fill the "blanks" in the assembler code such as the functions name, functions parameter name, local variable names and enter comments:
Keeping all your discoveries structured (as in IDA) is really the key to extending an application without having the source code. At the outset, it can be slow going because there's a lot of reverse engineering to do but as those blanks get filled, it gets much easier. After a while it almost becomes easier to implement a feature in this fashion than if you had access to the source code. I will explain why later.
Debugging
When you get the message box that you wanted, attach the debugger, break the execution and check the stack trace. Either your hook is too early and then you need to go up, or it is too soon and then you need to follow the execution to see which functions will be executed next.
(I prefer Visual Studio when it comes to debugging, especially when my plugin gets big and I want to see my code at the same time.)
Getting a stack trace when the application is compiled with the "omit stack pointers" parameter
If your application is compiled with the "omit stack pointers" flag, like the Neverwinter Nights client, it's normal that you can't get a stack trace printed for the following reason: an application that you debug uses the "ebp" register to store the position of the stack at the beginning of each function:
Dump of assembler code for function GetID__4CRes:
0x082b376c <+0>: push %ebp ;save the ebp register
0x082b376d <+1>: mov %esp,%ebp ;save esp (the stack pointer) in ebp
0x082b376f <+3>: mov 0x8(%ebp),%eax ;we can use ebp to access the function parameters
{ here the function could be initialising local variables, calling functions, so the esp register will change and so if we didn't had ebp, the only way to know how to access }
0x082b3772 <+6>: mov 0x4(%eax),%eax ;set the return value
0x082b3775 <+9>: pop %ebp ;restore the ebp register
0x082b3776 <+10>: ret
End of assembler dump.
So it is possible to get the stack trace by using a function like this one:
void print_stack_trace()
{
long* stack_frame;
__asm{mov stack_frame, ebp}; //stack_frame = %ebp
while (*stack_frame)
{
printf("%x\n", *(stack_frame+1));
stack_frame = (long*)*stack_frame;
}
}
But this not possible without knowing the position of the stack when the function was called. My favoured work around (although this is a little more work) is to use the debugger to step out and then take note where it ends up.
Hooking
Once you've found the function in the application that needs to be altered, you need to hook it. There's many libraries that provide functions to do that, but anti-virus programs tend not to like them, they make your plugin bigger and they are really not needed. The process is relatively straight forward and it is worth understanding. It is possible to hook anywhere in the assembly code, but the safest and cleanest way is to hook only functions.
Hooking a non-virtual function call
A non-virtual function call is any "call <func> " instruction in assembler. This instruction is 5 bytes long where the first bytes is 0xE8 and the 4 following are the relative address of the function to call. All we have to do is to change those 4 bytes: </func>
void hook_call(long src_addr, long to_addr)
{
*(long*)(src_addr+1) = to_addr-(src_addr+5);
}
hook_call(0x00402445, (long)my_function_hook);
In this example, 0x00402445 is the address of the call instruction:
.text:00402445 call sub_5FFCA0
my_function_hook is a function that use the same calling convention as sub_5FFCA0. Most of the time, the calling conventions will be __cdecl or __stdcall which just need to be specified in he function prototype:
__stdcall void my_function_hook();
__cdecl void my_function_hook();
But if the compiler was visual studio and you want to hook a member function, then your function need to use the __thiscall calling convention. The problem is that visual studio only let you use this convention for a member function, so how to hook a class function to a static function? My solution is to use the _fastcall convention: it is about the same except that it uses the edx register for the second parameter. So what I do is simply to add a second, dummy, parameter:
Gob* last_created_gob;
Gob* (__fastcall *gob_constructor_org)(Gob*, int, char*) = (int (__fastcall *)(Gob*, int, char*))0x007A8E80;
Gob* __fastcall gob_constructor_hook(Gob* gob, int edx, char* name)
{
last_created_gob = gob;
return gob_cnstructor_org(gob, edx, name);
}
This hook wont work for a virtual function call because they don't use a relative address but a register, like this:
call dword ptr [edi+1Ch]
So they are a bit more complex to hook. I will keep those for another article.
Hooking a function
This hook is for intercepting all calls to a specific function. In this case, we need to change the beginning of the function so it jumps to the new function which then call back to the original function. In other words, we need to change this:
func_to_hook <+0>: push %ebp
func_to_hook <+1>: mov %esp,%ebp
func_to_hook <+3>: mov 0x8(%ebp),%eax
func_to_hook <+6>: mov 0x40(%eax),%eax
func_to_hook <+9>: pop %ebp
func_to_hook <+10>: ret
...
new_function <+0>: push %ebp
new_function <+1>: mov %esp,%ebp
new_function <+4>: call func_to_hook
new_function <+9>: pop %ebp
new_function <+10>: ret
To this:
func_to_hook <+0>: jmp new_function
func_to_hook <+5>: nop ;the value of this byte dont matter because the processor will never reach it
func_to_hook <+6>: mov 0x40(%eax),%eax
func_to_hook <+9>: pop %ebp
func_to_hook <+10>: ret
...
new_function <+0>: push %ebp
new_function <+1>: mov %esp,%ebp
new_function <+4>: call allocated_mem ; call the orignal func_to_hook
new_function <+9>: pop %ebp
new_function <+10>: ret
...
allocated_mem <+0>: push %ebp
allocated_mem <+1>: mov %esp,%ebp
allocated_mem <+3>: mov 0x8(%ebp),%eax
allocated_mem <+6>: jmp func_to_hook
It can be achieved with this function:
void* hook_function(long from, long to, int len)
{
char* ret_code = (char*)malloc(len+5);
enable_write((long)ret_code);
memcpy(ret_code, (char*)from, len);
*(char*)(ret_code+len) = (char)0xE9; //jmp
*(long*)(ret_code+len+1) = (from + len) - (long)(ret_code+len+1+sizeof(long));
enable_write(from);
*(char*)from = (char)0xE9; //jmp
*(long*)(from+1) = to - (long)(from+1+sizeof(long));
return ret_code;
}
The "len" parameter is an amount of byte that we can move from the original function to an allocated space, in order to replace it by a "jmp hook_func" instruction. It must be at least 5 because the jmp instruction is 5 bytes (it works the same way as a call except that the first byte is 0xE9). This function can be used to hook anywhere, as long as the instructions that are within the range of the len parameter (the instructions that will get moved) are not a call or a jump to a relative address, because if you move a "jump +5", it wont jump at the same address any more. Finally, if you do not hook at the beginning of a function, you must make sure that the registers don't get changed by your code.
Hooking an imported function
This hook has the same result as hooking a function except that it is for an imported function (from a shared library) so they don't have a specific address, their addressed are loaded at run time. It is what I used in this example: I hook sendto which is a function imported from Ws2_32.dll. You could dynamically get their address and use the hook above, but you don't need to bother about doing that because you can simply modify the import table to force an imported function to be at the address of your own function.
In this case, because the hook_api function is a bit more complex, I'll provide the following example:
hook_api(GetProcAddress(GetModuleHandleA("KERNEL32.dll"), "HeapFree"), my_heap_free);
You can find the source code of the hook_api function in the demo application.
Implementing the features or fixes
In my case I just wanted to display a message box, so:
void (__cdecl *handle_chat_prompt_org)(void*, int);
void __cdecl handle_chat_prompt_hook(char** message, int p2)
{
MessageBoxA(0, (std::string("You said: ") + *message).c_str(), "", 0);
return handle_chat_prompt_org(message, p2); //I could prevent the chat message from being displayed in the chat bar by simply removing this line
}
But seriously, I will keep this part for another article, because there are too many other approaches I'd like to present. Aside from the following example (in which the application uses an object like this one), there's almost no limit as to what is possible to accomplish by hooking into an application to extend it:
class Players
{
int last_player_id;
int players_id[0x60];
int player_count;
};
And that the members of the class Players (last_player_id, players_id and player_count) are accessed a bit everywhere across the application, it is just impossible to increase the player count (size of the players_id) ... without rebuilding the application completely, so unfortunately a Neverwinter Nights server instance is stuck to 96 players. But otherwise, seriously, I believe that there's no limit to what you can do, perhaps even more than what is possible with the source code. With the source code you can't access private class members (at least not by design) but by hooking, you can access anything, anywhere (yes, yes, I know it's a bad practice... but I if I can revive an old game, why not?). For example, I was able to able to share information between two completely different game objects: a visual effects and a creature in order to enable PLT textures on visual effects.