Malware Engineering Part 0x3 — Crafting a peaceful parasite

Damn daemons sent to sleep yet the darkness danced in freedom ×_×

While you were reading this line, the actual virus body is chilling out at some corner of the host binary in the form of a parasite, waiting to take over the code execution of the innocent program. Till now we’ve implemented the algorithm to find a home for the virus body inside an innocent host binary but what evil/good is supposed to be done once we hijack the code execution (of host process) is decided by the parasite code (the injected code that makes itself as home inside an innocent host binary). This paper focuses on designing & crafting parasite body for ET_EXEC (executable) type binaries and shows why traditional parasite design is not suitable for disk-based binary infection on modern Linux systems consisting mainly of PIE(s) (Position Independent Executable). Later, this paper overcomes the traditional parasite design problem by introducing Doctor Proc’s algorithm that allows the parasite to determine OEP return address (Original Entry Point) and continue a successful control flow transfer to the host program.

NOTE : This article is the continuation of Part 0x2 — Finding shelter for parasite of the malware engineering series where we implemented the first infection algorithm without considering the parasite development. Also, this article doesn’t intend to teach the art of writing shellcode (please refer to other sources for it).

Prerequisites

  • Intel x86–64 bit assembly language will be of great help while understanding the article to the fullest.

I already know C, am I done with the homework?

That’s quite impressive but dismally, we can’t inject C language source code into memory. The source code kept on disk is equivalent to any other text file (containing ASCII encoded characters) which makes no sense to the CPU until it is translated/encoded (by some compiler or interpreter) into CPU understandable instructions. These CPU instructions translated by — lets say a compiler toolchain, are present in the .text section of an ELF binary (on disk) and are mapped to the TEXT segment of process address space when loaded in memory. Lets see how the compiler translated our sample source code —

Still hate him for his nose

Above is a sample program compiled by GNU GCC. Here, we disassemble (i.e. to do inverse operation of the assembler) the main() function using objdump utility which presents us with assembly instructions (highlighted with blue) corresponding to the raw opcode bytes/binary encoding (highlighted with green). The source code along with the above assembly instructions (highlighted in blue) makes no sense to the processor, but these raw bytes can directly be interpreted by the processor once the host binary is loaded into memory for execution.

In order to craft injectable code, knowledge of processor’s ISA (Instruction Set Architecture) is of great help as it provides great deal of flexibility and optimization while writing injectable code. Alternatively, raw bytes can also be extracted from a compiled binary but that may need further modifications before we can achieve our goals. Our parasite body will constitute of the binary encoding (these raw bytes) of assembly instructions and data (nothing more).

Constraints before we cast spell

Our parasite should have the following characteristics -

  • Should be self-contained, i.e. it should not be dependent on anything other than its own body (should not require any linking prior to execution).
  • Should be Position Independent Code (PIC), i.e. should be able to get successful execution regardless of where the parasite is injected. For this, relative addressing (relative offsets) should be used rather than absolute addressing (fixed addresses).
  • Should be able to restore the processor back to its original state (i.e. the state just before hijacking the code flow of host process) after it has finished its execution.

Poison for executable binary (ET_EXEC)

Enough of the theory part, let’s dive right into the implementation. Previously, we gave loadparasite() (defined in evil_elf.c), the path to parasite on disk which loads the parasite to heap memory that later gets injects into host binary. Let’s start with a simple parasite for ET_EXEC type binary which displays the message —
-x-x-x-x- COMPILEPEACE : Cute little virus ^_^ -x-x-x-x-”

to STDOUT (Standard Output console) and safely transfers control to a specified address (to be patched by the infection algorithm written in previous article with original entry point address of host executable binary).

exec_parasite.asm

I’ll be writing parasite code in INTEL syntax to for greater legibility. BITS 64 guides nasm that the code is intended to be assembled in 64 bit mode.

  • (Line 16) We startup by defining the _start symbol indicating the entry point for parasite (i.e. the code execution will begin from the first instruction at this symbol) after which we define the .text section.
  • (Line 813) Before starting up with parasite’s logic, it will be better to save some register values on stack as they are probably going to get clobbered up by the kernel. This is done to restore the processor state after getting the parasite code executed.
  • (Line 1516) There won’t be any data section for the string (message label) to be printed. Message string will be present as a sequence of raw bytes in .text section itself. It is our responsibility to jump off these data bytes and continue code execution (that’s what jmp parasite instruction does).
  • (Line 2128) This is the actual logic of parasite code which performs a write() syscall with its prototype (at line #21). A system call acts as an interface between the process (running in user-space) and the operating system kernel. It is used to request a service from operating system via operating system API. It is performed by generating a software interrupt (the syscall instruction above) that switches the context from user space to kernel memory space. Here we request the kernel to print the stream of characters in message string to standard output ( STDOUT) console. The parameters to write() syscall are passed into registers (RDI, RSI and RDX) as per the system call conventions (System V ABI) and syscall number is passed in RAX register. Rest is explained by the comments (the red lines followed by a ;)
  • (Line 3136) The register values saved earlier are poped off back into corresponding registers to restore the processor state to original state (i.e. state just before the parasite code’s execution)
  • (Line 39) It moves the value of 0xAAAAAAAAAAAAAAAA (a placeholder value to be patched by infection algorithm as soon as the parasite code is loaded into heap segment of process infecting binaries) into RBX.
  • (Line 40) Jumps to the patched address (which is the original entry point of the host binary being infected) stored inside RBX.

Now making this code injectable is just a matter of assembling the code into a file that contains binary encoding corresponding to the assembly instructions. Bellow is the command to get binary encoding of exec_parasite.asm

$ nasm -f bin exec_parasite.asm -o exec_parasite

Placid parasite is nervous

Till now, our strategy was to hand over code flow to the parasite that greets up with a message (confirming its presence) and transfers code flow back to the original entry point address of host binary. The original entry point (in ET_EXEC) was extracted from the host binary in the form of an absolute address, patched into the parasite code (i.e. at placeholder 0xAAAAAAAAAAAAAAAA) which is loaded into heap segment of infecting process address space for modification. Finally the parasite jumps back to the patched absolute address (original entry point) to resume the intended code execution of the original host binary.
Let’s try using the same parasite to infect shared object binaries.

We created a copy of /bin/ls program (i.e. used to list contents at a directory location) and tried infecting it with parasite for ET_EXEC. Running the shared object binary after infection results into proper execution of parasite code (confirmed by the greeting message it prints) along with an unfortunate crash (segmentation fault).

Doctor, my beloved binary is just not feeling good !

The crash happened because we tried to jump to an invalid address (i.e. an offset which doesn’t lie inside the process address space of the host binary executing our parasite). Shared objects (including the host binary) have their entry point specified as an offset rather than an absolute address. This makes sense as shared object binaries are compiled as PIC (position independent code), i.e. able to be loaded at any base address (linked dynamically into the process address space). Therefore, the parasite (mentioned above) for ET_EXEC(executable type binaries) creates noise in case of ET_DYN (shared object binaries). Now, the question arises —

  • How to calculate the address that the parasite should jump to, after it finishes executing its own body ?

All we have right now is an entry point offset. We can get the original entry point address from offset by adding the entry point offset to the base address of the host binary process.

Entry Point Address = base address(of host process) + entry point offset;

  • So what’s the issue then, can’t we now get the base address (i.e. the address at which a host binary is loaded at runtime) and enjoy another day on earth?

Indeed, the issue is that we cannot guess the base address beforehand as every time the binary is supposed to be loaded at a different base address (read about ASLR to know more). Since the base address is decided by the kernel during process initialization (when the host binary is loaded into memory) and as we consider ourselves a determined attacker, it is clear that we have to somehow implement the calculation of OEP address in the parasite code itself.

/proc filesystem to the rescue

Fortunately, there exists an interface to the kernel data structure that we can leverage to accomplish our goal. The /proc/[pid] directory contains information about each running process and the file — /proc/<pid>/maps contains the process memory mapped regions along with their access permissions. If a process wants to read its own memory mapping, it can read the file — /proc/self/maps . Let’s see the memory mappings of cat command —

Here, we can see that the base address of the cat process is given by first 12 characters (5638ecd28000) of this file. If we extract first few bytes (until the character ‘-’ is encountered) and add it to the original entry point (OEP) offset of the host binary, our parasite could happily jump to the formulated address and resume the clean intended behavior of host binary.

Doctor Proc’s Love

so_parasite.asm

Parasite body till line #36 is the same as exec_parasite.asm (explained above) except that line number 68 defines some macros for the sake of readability and ease of modification. Also line number 23 defines the symbol filepath, which stores the data bytes “/proc/self/maps” (i.e. the file to read base address from).

  • (Line 4146) This sets up registers performing open() syscall. RAX (accumulator register) stores the file descriptor to the file /proc/self/maps opened in memory.
so_parasite.asm
  • (Line 5659) Any value XOR’d with itself produces a 0 (i.e. 1^1=0 & 0^0=0 ), this is used to zero out temporary registers.
  • (Line 60) Saves the file descriptor to lower 8 bits of RDI register.
  • (Line 6163) It allocates space on stack segment of host process by subtracting ALLOC_SPACE bytes from stack pointer register(SP) and assigns the location on stack to RSI register which will store the base address characters read from /proc/self/maps. RSI stores the address of buf in context to the function prototype described on line #54.
  • (Line 64–66) DX (Lower 16 bits of RDX register) stores 1 (i.e. the number of bytes to be read by read() syscall.
so_parasite.asm
  • (Line 7379) It performs the read syscall, here 1 character is read from file and stored on the location pointed to by RSP (stack pointer). Then we compare that read byte with 0x2d (which is equivalent to ‘-’ character). This is the delimiter character (i.e. we should stop reading as soon as this character is encountered). Line #79 stores the character byte read into R8.
  • (Line 8283) Since the address is in hexadecimal encoding, we can encounter both the digits and alphabets. Therefore, the code flow jumps to corresponding label (alphabet_found or digit_found) depending on the character read. If character read is less than or equal to 0x39 (character ‘9’), then it is a digit and the code flow jumps to the label digit_found otherwise an alphabet is encountered.
  • (Line 8594) The characters read from the file are ASCII encoded whereas we need their corresponding hexadecimal encoding to form a valid address. Alphabet characters ‘a’ to ‘f’ are represented between 0x61 to 0x66 whereas the digit characters ‘0’ to ‘9’ ranges from 0x30 to 0x39 as their hexadecimal representation . If the character read is an alphabet and we subtract 0x57 from it (i.e. 'a’(0x61) — 0x57 = 0xa and 0x62(‘b’) — 0x57 = 0xb), which results in corresponding hexadecimal representation of the character we just read. Similarly we subtract 0x30 from digit characters encountered (eg: ‘3’(0x33) — 0x30 = 0x3).
  • (Line 9698) Each character encountered will correspond to a 4-bit value (a nibble) of the base address. For eg — 0x5638ecd28000 is 6 bytes in width containing 12 characters. Here we use RBX register (8 bytes width) to store the base address being calculated. The instruction at line #97 shifts the value at RBX (initially 0) by 4 places (i.e. every bit in RBX is shifted left by 4 places leaving the lower 4-bits of RBX zero). Line #98 performs a bitwise OR between RBX (storing the base address being calculated) and R8 (storing the character read) storing the result in RBX.
  • (Line 100103) Then we increment the stack pointer (RSP) by a byte and loads the address pointed to by RSP into RSI. Line #103 loops back to Line #73 to again perform a read() syscall and read another character. This loop continues until a 0x2d (‘-’) is encountered.
  • (Line 106112) By the time we reach the label ‘done’, we’ll be having our extracted base address in register RBX. I’ve placed comments explaining these instructions individually.
so_parasite.asm
  • (Line 119131) At last we restore the processor state and jump to the calculated jmp-on-exit address stored in R8.

Smuggling that peaceful parasite

Bellow is the demonstration of running the infection algorithm on shared object binary such that the so_parasite.bin is injected resulting in a peaceful signature display upon execution.

EPILOGUE

The technique described in the article series is based on entry point modification technique (based on Silivio Cesare’s segment padding infection). The parasite prints the signature to STDOUT which can be used as its Indicator Of Compromise (IOC) for detection while creating a disinfection can be as simple as patching the entry point to the offset embedded inside the raw opcode bytes of parasite body. I haven’t created any disinfection since the virus body itself is harmless. There are other interesting techniques — reverse text infection, data segment infection, PT_NOTE infection, GOT poisoning, .init_array/.fini_array infection; to name a few. At last, memory infections are a lot more stealthier than disk based infections and have a fascinating world of their own (one should definitely check them out).

DISCLAIMER — This paper presents a parasite template for modern Linux systems running on x86–64 Intel Architecture which can very much be used as some weapon of destruction with enough modifications. I would like to clearly state that the content published here is intended only for educational and research purposes. Therefore, it doesn’t take responsibility for anyone attracting hell by carrying malicious intentions. Happy hacking ×_×

Cheers,
Abhinav Thakur
(a.k.a compilepeace)

Github : https://github.com/compilepeace
Connect on Linkedin

software security researcher