Malware Engineering Part 0x3 — Crafting a peaceful parasite

13 min readJan 8, 2020

Damn daemon sent to sleep yet the darkness danced in freedom ×_×

While you are reading this line, the actual virus body is chilling out at some corner of the host binary in the form of a parasite, waiting to take over code execution of some benign program. Previously, we saw an implementation of Silvio’s infection algorithm to find a home for virus body inside an innocent host binary but what evil/good is supposed to be done once we hijack the execution flow (of host process) is decided by the parasite code (the injected code that makes itself as home inside the victim binary).

This paper shows why traditional parasite design (being used till now) to infect system programs is no longer suitable for disk-based binary infection on modern Linux systems which is having ASLR compatible PIE software all around (Position Independent Executables). Later it presents a technique to overcome this limitation (of traditional parasite design) by discussing the design & implementation of a Linux Parasite capable of infecting modern executables (PIE ELFs) by introducing an algorithm that allows the parasite to determine host OEP address (Original Entry Point) at runtime to resume the intended execution flow of host software.

Even though the article series is written in the below mentioned chronological order, feel free to skip to whatever interests you more.

NOTE : This article is the continuation of Part 0x2 — Finding shelter for parasite which talked about Silvio’s infection algorithm without taking parasite development into consideration. Also, this paper doesn’t intend to cover shellcode development, which deals with code injection during runtime and development for which is usually under a restricted environment unlike parasite code.

Prerequisites

Having a context of what we’re upto is required, although I’ll try my best to cover it on-the-fly as much as possible.
Familiarity with Intel x86–64 ISA — i.e. being able to read x86–64 assembly language will be of great help while understanding the article to the fullest.

I already know C, am I done with the homework?

That’s quite impressive but unfortunately, we can’t inject C language source code into memory. The source code kept on disk is equivalent to any other text file (containing ASCII encoded characters) which makes no sense to the CPU until it is translated/encoded (by some compiler or interpreter) into CPU understandable instructions. These CPU instructions translated by — lets say a compiler toolchain, are present in the .text section of an ELF binary (on disk) and are mapped to the TEXT segment of process address space when loaded in memory. Lets see how the compiler translated our sample source code —

Above is a sample program compiled by GNU GCC. Here, we disassemble (i.e. to do inverse operation of the assembler) the main() function using objdump utility which presents us with assembly instructions (highlighted with blue) corresponding to the raw opcode bytes/binary encoding (highlighted with green). The source code along with the above assembly instructions (highlighted in blue) makes no sense to the processor, but these raw bytes can directly be interpreted by the processor once the host binary is loaded into memory for execution.

In order to craft injectable code, knowledge of processor’s ISA (Instruction Set Architecture) is of great help as it provides great deal of flexibility and optimization while writing injectable code. Alternatively, raw bytes can also be extracted from a compiled binary but that may need further modifications before we can achieve our goals. Our parasite body will constitute of the binary encoding (these raw bytes) of assembly instructions and data (nothing more).

Constraints before we cast spell

Our parasite should have the following characteristics -

Should be self-contained, i.e. it should not be dependent on anything other than its own body (should not require any linking prior to execution).
Should be Position Independent in nature (PIC), i.e. should be able to get successful execution regardless of where the parasite is injected. For this, relative addressing (relative offsets) should be used rather than absolute addressing (fixed addresses).
Should be able to restore the context — i.e. processor back to its original state (i.e. the state just before hijacking the code flow of host process) after it has finished its execution.

Poison for executable binary (ET_EXEC)

Enough of the theory part, let’s dive right into the implementation. Previously, we gave loadparasite() (defined in evil_elf.c), the path to parasite on disk which loads the parasite to heap memory that later gets injects into host binary. Let’s start with a simple parasite for ET_EXEC type binary which displays the message —
“-x-x-x-x- COMPILEPEACE : Cute little virus ^_^ -x-x-x-x-”

to STDOUT (Standard Output console) and safely transfers control to a specified address (to be patched by the infection algorithm written in previous article with OEP address — i.e. original entry point of host executable binary).

I’ll be writing parasite code in INTEL syntax to for better legibility. BITS 64 guides nasm that the code is intended to be assembled in 64 bit mode.

(Line 1–6) We startup by defining the _start symbol indicating the entry point for parasite (i.e. the code execution will begin from the first instruction at this symbol) after which we define the .text section.
(Line 8–13) Before starting up with parasite’s logic, it will be better to save some register values on stack as they are probably going to get clobbered up by the kernel. This is done to restore the processor state after getting the parasite code executed.
(Line 15–16) There won’t be any data section for the string (message label) to be printed. Message string will be present as a sequence of raw bytes in .text section itself. It is our responsibility to jump off these data bytes and continue code execution (that’s what jmp parasite instruction does).
(Line 21–28) This is the actual logic of parasite code which performs a write() syscall with its prototype (at line #21). A system call acts as an interface between the process (running in user-space) and the operating system kernel. It is used to request a service from operating system via operating system API. It is performed by generating a software interrupt (the syscall instruction above) that switches the context from user space to kernel memory space. Here we request the kernel to print the stream of characters in message string to standard output ( STDOUT) console. The parameters to write() syscall are passed into registers (RDI, RSI and RDX) as per the system call conventions (System V ABI) and syscall number is passed in RAX register. Rest is explained by the comments (the red lines followed by a ;)
(Line 31–36) The register values saved earlier are poped off back into corresponding registers to restore the processor state to original state (i.e. state just before the parasite code’s execution)
(Line 39) It moves the value of 0xAAAAAAAAAAAAAAAA (a placeholder value to be patched by infection algorithm as soon as the parasite code is loaded into heap segment of process infecting binaries) into RBX.
(Line 40) Jumps to the patched address (which is the original entry point of the host binary being infected) stored inside RBX.

Now making this code injectable is just a matter of assembling the code into a file that contains binary encoding corresponding to the assembly instructions. Bellow is the command to get binary encoding of exec_parasite.asm

$ nasm -f bin exec_parasite.asm -o exec_parasite

Placid parasite is nervous

Till now, our strategy was to hand over code flow to the parasite that greets up with a message (confirming its presence) and transfers code flow back to the original entry point address of host binary. The original entry point (in ET_EXEC) was extracted from the host binary in the form of an absolute address, patched into the parasite code (i.e. at placeholder 0xAAAAAAAAAAAAAAAA) which is loaded into heap segment of infecting process address space for modification. Finally the parasite jumps back to the patched absolute address (original entry point) to resume the intended code execution of the original host binary.
Let’s try using the same parasite to infect shared object binaries.

We created a copy of /bin/ls program (i.e. used to list contents at a directory location) and tried infecting it with parasite for ET_EXEC. Running the executable binary after infection results into proper execution of parasite code (confirmed by the greeting message it prints) along with an unfortunate crash (segmentation fault).

Doctor, my beloved binary is just not feeling good !

The parasite creates noise by triggering a segmentation fault, this usually happens when a process tries to access an invalid region of memory. In our case, the parasite successfully executed the write(2) syscall but perhaps tried to jump to an invalid address (can be confirmed by analyzing the host binary inside GDB). This means the current parasite code does not work on modern ELFs due to an invalid OEP address of host binary.

Traditionally, this parasite design used to work fine because the host binaries used to have absolute addresses as entry points but these days, binaries have their entry point specified as an offset rather than absolute addresses. This makes sense as ASLR (Address Space Layout Randomisation) has to ensure that every binary gets mapped at a different base address during loading and later (at runtime), the dynamic linker will be fixing up the entire process image by adding these offsets to base load address (i.e. relocations on references to all instances of function and data symbols) . This is the reason why we need programs compiled as PIC (position independent code), i.e. able to be loaded at any base address.

What now, shall we give up ?

Not yet, we just need to come up with a better question —

During host execution, how to get the host OEP address that the parasite should jump to, after it finishes executing its own body ?

All we have right now is an entry point offset. We can get the OEP address from offset by adding the entry point offset to the base address of the host binary process.

Entry Point Address = base address(of host process) + entry point offset;

We have the entry point offset. So what’s the issue then, can’t we now get the base address (i.e. the address at which a host binary is loaded at runtime) and enjoy another day on earth?

Indeed, the issue is that we cannot guess the base address beforehand as every time the binary is supposed to be loaded at a different base address (remember ASLR ?). Since the base address is determined by the kernel during process initialization, it is clear that we cannot decide what base address our host process is going to be loaded at. However, “we cannot decide” is not the same as “we cannot determine”. I could think of 2 possible solutions here (feel to free to reach me out as I’m open to discussion for more) -

EDITED — maybe calculate the parasite offset from host OEP and perform RIP-relative jump… or
why not ask for the base load address from the kernel itself ?

/proc filesystem to the rescue

Fortunately, there exists an interface to the kernel data structures that we can leverage to accomplish our goal. The kernel exposes it as /proc/[pid] directory which contains information about every running process on system and the file — /proc/<pid>/maps contains the memory map of process along with the access permissions on segments. If a process wants to read its own memory map, it can read the file — /proc/self/maps . Let’s check the memory map of cat program —

Here, we can see that the base address of the cat process is given by first 12 characters (5638ecd28000) of this file. If we extract first few bytes (until the character ‘-’ is encountered) and add it to the original entry point (OEP) offset of the host binary, our parasite could happily jump to the formulated address and resume the clean intended behavior of host binary.

Doctor Proc’s Love

Parasite body till line #36 is the same as exec_parasite.asm (explained above) except that line number 6–8 defines some macros for the sake of readability and ease of modification. Also line number 23 defines the symbol filepath, which stores the data bytes “/proc/self/maps” (i.e. the file to read base address from).

(Line 41–46) This sets up registers performing open() syscall. RAX (accumulator register) stores the file descriptor to the file /proc/self/maps opened in memory.

(Line 56–59) Any value XOR’d with itself produces a 0 (i.e. 1^1=0 & 0^0=0 ), this is used to zero out temporary registers.
(Line 60) Saves the file descriptor to lower 8 bits of RDI register.
(Line 61–63) It allocates space on stack segment of host process by subtracting ALLOC_SPACE bytes from stack pointer register(SP) and assigns the location on stack to RSI register which will store the base address characters read from /proc/self/maps. RSI stores the address of buf in context to the function prototype described on line #54.
(Line 64–66) DX (Lower 16 bits of RDX register) stores 1 (i.e. the number of bytes to be read by read() syscall.

(Line 73–79) It performs the read syscall, here 1 character is read from file and stored on the location pointed to by RSP (stack pointer). Then we compare that read byte with 0x2d (which is equivalent to ‘-’ character). This is the delimiter character (i.e. we should stop reading as soon as this character is encountered). Line #79 stores the character byte read into R8.
(Line 82–83) Since the address is in hexadecimal encoding, we can encounter both the digits and alphabets. Therefore, the code flow jumps to corresponding label (alphabet_found or digit_found) depending on the character read. If character read is less than or equal to 0x39 (character ‘9’), then it is a digit and the code flow jumps to the label digit_found otherwise an alphabet is encountered.
(Line 85–94) The characters read from the file are ASCII encoded whereas we need their corresponding hexadecimal encoding to form a valid address. Alphabet characters ‘a’ to ‘f’ are represented between 0x61 to 0x66 whereas the digit characters ‘0’ to ‘9’ ranges from 0x30 to 0x39 as their hexadecimal representation . If the character read is an alphabet and we subtract 0x57 from it (i.e. 'a’(0x61) — 0x57 = 0xa and 0x62(‘b’) — 0x57 = 0xb), which results in corresponding hexadecimal representation of the character we just read. Similarly we subtract 0x30 from digit characters encountered (eg: ‘3’(0x33) — 0x30 = 0x3).
(Line 96–98) Each character encountered will correspond to a 4-bit value (a nibble) of the base address. For eg — 0x5638ecd28000 is 6 bytes in width containing 12 characters. Here we use RBX register (8 bytes width) to store the base address being calculated. The instruction at line #97 shifts the value at RBX (initially 0) by 4 places (i.e. every bit in RBX is shifted left by 4 places leaving the lower 4-bits of RBX zero). Line #98 performs a bitwise OR between RBX (storing the base address being calculated) and R8 (storing the character read) storing the result in RBX.
(Line 100–103) Then we increment the stack pointer (RSP) by a byte and loads the address pointed to by RSP into RSI. Line #103 loops back to Line #73 to again perform a read() syscall and read another character. This loop continues until a 0x2d (‘-’) is encountered.
(Line 106–112) By the time we reach the label ‘done’, we’ll be having our extracted base address in register RBX. I’ve placed comments explaining these instructions individually.

(Line 119–131) At last we restore the processor state and jump to the calculated jmp-on-exit address stored in R8.

Smuggling that peaceful parasite

Bellow is the demonstration of running the infection algorithm on shared object binary such that the so_parasite.bin is injected resulting in a peaceful signature display upon execution.

EPILOGUE

The technique described in the article series is based on entry point modification technique (based on Silivio Cesare’s segment padding infection). The parasite prints the signature to STDOUT which can be used as its Indicator Of Compromise (IOC) for detection while creating a disinfection can be as simple as patching the entry point to the offset embedded inside the raw opcode bytes of parasite body. I haven’t created any disinfection since the virus body itself is harmless. There are other interesting techniques — reverse text infection, data segment infection, PT_NOTE infection, GOT poisoning, .init_array/.fini_array infection; to name a few. At last, memory infections are a lot more stealthier than disk based infections and have a fascinating world of their own (one should definitely check them out).

DISCLAIMER — This paper presents the design and implementation of a harmless parasite for modern Linux systems running on x86–64 Intel Architecture. Hopefully, this article series adds some value to the defensive security community by exposing the process of malware engineering from the perspective of a black hoodie. I personally believe a quality defense can only be developed by understanding true offense. This article series is intended for exploit developers, malware analysts, folks indulged in red/blue team operations and independent researchers struggling to find relevant resources into this area. The content is intended to be used solely for educational purposes. Therefore, it doesn’t take responsibility for anyone attracting hell by carrying out malicious intentions. Happy hacking ×_×

Cheers,
Abhinav Thakur
(a.k.a compilepeace)

Github : https://github.com/compilepeace
Connect on Linkedin