Malware Engineering Part 0x1 — That magical ELF

Magic is nothing more than a logic running beyond the imagination of its audience ×_×

Since the day I got to hear about the term computer virus, I was curious to know what it exactly is or how is it created but the internet was filled with much of the random and superficial stuff about malware that it was hard to extract any useful information. Some months ago, I started with a simple file system crawler on Linux which ended up into an infector program - Kaal Bhairav (perhaps not a virus itself but capable of generating a few of them). The article series is focused on crafting a simple ELF infector for disk-based infection on Linux platform (x86–64 bit INTEL architecture) which will parse the Linux filesystem and trojanize (generate segment-padded trojans) every Linux binary with a parasite code such that on execution of any infected binary by the user, the attacker’s malicious code (parasite residing in host binary) is silently executed along with the binary’s intended execution on the host system. This article is by no means a complete reference but does contain the core concepts which will help people getting started with binary analysis and low-level hacking domains. The article series is divided into 4 parts starting from the prerequisites and later discussing different techniques for infection -

First Things First

Before poking the devil, its better to the know the devil and his capabilities. Since we’ll be dealing with disk based infections on linux, an understanding of ELF file format is required for which I’ve prepared a hands-on course on github to understand ELF binaries (which make use of tools like xxd, readelf and objdump to dissect and understand linux binaries) or one could also read the more complete official ELF specification describing ELF file format (link bellow) which would greatly help understanding the article series.

What is an ELF binary?

Nothing related to the world of supernatural creatures but an ELF (Executable and Linkable Format) is the standard file format for all the binaries compiled on *NIX systems (i.e. UNIX based systems such as Linux). The ELF binaries on any *NIX system may be one of the 4 types -

  • Executable file (ET_EXEC) —It is linker processed, ready to execute binary. It can be compiled with flag on modern *NIX platforms.
  • Relocatable file (ET_REL)— It consists of pieces of (Position Independent Code) and data that is not yet linked into an executable. Conventionally named as files (object files) which are not yet processed by the compile-time/static linker —
  • Shared object file (ET_DYN) — Binaries marked as ET_DYN type are ready to be linked within 2 views & . This means that it can be combined with other relocatable or shared object binaries to create another object file (after being processed by the static linker /usr/bin/ld) or it can be combined with executable binaries to create a process image during dynamic linking (by runtime linker), i.e. getting treated as a shared library. It differs from relocatable binaries which are mostly used by compile-time linker rather than runtime linker. On modern Linux distributions almost all the system binaries are compiled as shared objects (having entry point as an offset) unlike an executable type (ET_EXEC) in which entry point is an actual address.
  • Core file (ET_CORE) — It is a dump of program’s process address space generated by the kernel when a crash happens or the program receives a SIGSEGV signal (Segmentation fault). It is primarily dumped for the purpose of or (using eu-readelf instead of the standard readelf program). and might be interested in ECFS (which is an extremely useful extension to Linux core files written by elfmaster).

ELF skeleton, scary eh?

Looking at the bare bones is the best way to learn about something. Now talking about skeleton of an ELF binary, it consists of 4 main ingredients —

  • (short for Program Header Table)
  • (short for Section Header Table) .

I’ll shortly be explaining each of the 4 ingredients in an order suitable for explanation. You can jump to the clickable links on the headings below for insights or have a look on the Linux manual for elf ($ man 5 elf) if you still don’t understand something.

ELF (Executable and Linkable Format) as the name suggests specifies 2 views — Linkable view and an Execution View. In a linkable interface, the binary present on disk is divided into logical blocks termed as (useful to the compiler toolchain)Execution view represents a memory view of the binary on disk, i.e. it exposes how the binary will be laid out into a process address space. A binary after getting loaded into memory is divided into logical blocks known as , whereeach segment maps to one or more sections (as seen in the linking view).

ELF interfaces — Executable and Linkable View (taken from ELF specification v1.2)

Before we discuss about sections, lets cover some more ground. Have a look at the source code below —

int shit;
void serve () { shit = 0xd3adbeef; }
int main () { pain (); }

A processor doesn’t know how to deal with (which are just sequence of characters for human understanding) and just wishes to deal with The names used by programmer in above source (, , ) are just symbolic representations to areas in memory. Here, and are both representation to areas storing code (therefore considered as function symbols) whereas represents data at some location (therefore considered as a data symbol). The whole task of a linker is to resolve these function and data symbols into an appropriate address locations in memory — a concept termed as relocation which applies to both compile time (for static symbols) as well as run time (for dynamic symbols) linking.

SectionsActual code and data used by the binary is present in the form of sections which are present only in the linking view (on-disk representation) of a binary. I’ll discuss briefly about some important sections here. Knowing about other sections is of course a bonus !

  • : This section stores information about all the static symbols (described above) present in the binary. This section is not present in stripped binaries as static symbols are used only by the compile-time linker (ld) and not by the dynamic linker (ld.so) which does fixups during program execution.
  • : It is present only in dynamically linked binaries and consists of information about the dynamic symbols, i.e. symbols that are imported from the shared libraries or sources external to the binary. These are required by the dynamic linker during program execution and cannot be stripped from these binaries.
  • :Remember, symbols are strings which the processor doesn’t like to deal with when executing a program. This section is the string table for (i.e. symbols present in .symtab section).
  • :It is the string table for (i.e. symbols present in .dynsym section).
  • : this section stores a string pointer to the program interpreter. Isn’t it strange that a compiled program (which as per definition is readily loadable and executable by the processor) has a section defining another interpreter program on which its execution depends ?
    This gets clearer as we study the process of dynamic linking. For now, this section stores the path of ld-2.xx.so which is the dynamic linker for almost all the ELF binaries on Linux.
  • : It contains the executable instructions which will get executed before the main function, i.e. used as the initialization code for the process.
  • :It contains the executable instructions that will get executed after the main function returns, i.e. used as the termination code for the process.
  • .got : Short for Global Offset Table . It is a Table of pointers used to locate data symbols present in the binary.
  • :It is analogous to .got section but is used to locate library function symbols rather than data symbols in a binary. When a binary is loaded into memory, the dynamic-linker does some last minute relocations. Now, there’s a process called as ( in windows terminology) which ensures that the addresses for a shared library function in .got.plt (global offset table for function symbols that are external to the binary) aren’t resolved by the dynamic-linker until the first invocation of that library function.
  • :Short for Procedure Linkage Table. This section stores the stub-code which is used by dynamic-linker to resolve symbols in the binary at run-time. It is used in conjunction with .got.plt section to perform asexplained above
  • : All the actual executable code written by the programmer is placed in this section of the elf binary. It has access permissions of R-X (Readable & Executable but not Writable, i.e. W^X).
  • : All initialized data is stored in this section having permissions of Read-Only. All constants including any format string (provided to functions like printf/scanf etc.) while compiling will be present in this section. For eg: in the statement printf(“Hello Hell\n”); , will be placed in .rodata section.
  • :It stores an array of pointers(addresses) to all the code blocks (or functions) which will get executed as constructor/initialization routines for the process.
  • : It stores an array of pointers(addresses) to all the code blocks (or functions) which will get executed as destructor/finalization routines for the process (i.e. will get executed just after the main() returns).
  • :It consists of all the initialized global and static symbols defined by your code. On program execution this section is usually present in DATA segment (with Read Write permissions) of the process address space.
  • :It consists of uninitialized global or static symbols as defined by your code. This section is special as it does not take any space on disk (just enough space to mark its presence) and is zero initialized by its size when a program is loaded into memory. It is one of the reasons why size of file (on disk) differs from image size.
  • : This section stores all the section names (including its own name) in the form of ASCII encoded strings. This section is will be used later to find section. Use readelf -p <section_name> <binary_path> to dump strings from a section.

Section Header Table (SHT) It describes the of the binary. This table has entries for all the sections in the binary where each entry describes the respective section attributes (i.e. the section offset, type, size, flags etc.). First entry (0th index) always stores NULL, i.e. zeroed out for historical reasons (read NMAGIC, ZMAGIC and QMAGIC schemes for more details). Use to view SHT entries of an elf binary.

SHT (Section Header Table)
  • : It stores an index. The index acts as an offset into the .shstrtab (Section Header String Table) section which stores the section names (ASCII encoded).
  • : This identifies the section content and semantics. Some of the most commonly encountered section types are —
    → PROGBITS (SHT_PROGBITS) type section holds binary data defined by the program.
    → .bss section is of type NOBITS (SHT_NOBITS) which means that the section occupies no space in file (on disk).
    → static and dynamic symbols are stored in sections of having types (SHT_SYMTAB and SHT_DYNSYM).
    → relocation information is stored in sections of type (SHT_REL and SHT_RELA).
  • :The section’s first byte will appear on this address (specified by this field) if the section gets loaded into memory.
  • :This field stores the offset at which first byte of the section starts in file (on disk).
  • :Size of section (on disk size).

Program Header Table (PHT)It describes the of the binary. Each entry describes a segment with various attributes. It guides the operating system on how to load the on-disk representation(sections) of binary program into memory(segments) for execution, i.e. it guides the loader on sections will constitute segment. If PHT doesn’t make any sense to you, you may want to read this. It is placed just after the ELF Header (i.e. at an offset of 64 bytes from the beginning of the file).

Tools like readelf can interpret sections-to-segment mappings (which is created as described by the linker script). Use to view along with of an elf binary. Grep for Elf64_Phdr in /usr/include/elf.h for inner details about PHT structure.

damn these sections to segment mappings
  • : This specifies the segment type. There are generally 2 loadable segments (specified by p_type value — PT_LOAD), i.e. code (or .text segment) and data segment.
  • : Since a segment comprises of one or more sections. It specifies the offset (from the the first byte of the file) at which the first byte of segment resides.
  • : Describes the , i.e. at what virtual address will the data inside the segment live.
  • :Describes the . This field is used mainly on embedded systems which do not implement virtual memory and can safely be ignored here.
  • :Size of segment in file (on disk).
  • :Size of segment in memory. If this field exceeds p_filesz, then the remaining space is filled with zeroes at runtime (usually the case with DATA segment where space is allocated for bss segment).
  • :Specifies the permission flags for a segment, i.e. PF_R (read), PF_W (write) or PF_X (execute).
  • :Segment alignment in memory. Segments are usually page aligned.

There are 2 segments @ 02 (CODE) and 03(DATA) that are marked as LOAD (i.e. space will be allocated in process memory for the sections present in these 2 segments). As we can see in that the initialized/uninitialized global or static variables (data in .bss and .data sections) may be altered by the program code and are therefore present in the DATA segment (03) with Read+Write permissions. Similarly the .rodata section should only have Read permissions and is therefore present in CODE segment (02) with Read+Execute permissions and not in DATA segment. Also, the p_filesz and p_memsz of 03 (DATA) segment differs due to the presence of .bss section in it (which takes 8 bytes in file to mark its presence).

ELF Header — Header consists of the metadata information about the binary or simply can say that it acts as a map to the elf binary. It consists of fields which describe what of elf binary it is (shared object, executable etc), of binary (x86/x86–64 bit compatibility), pertaining to SHT/PHT etc. It occupies 1st 64 bytes and is always present at the beginning of an elf binary. Use to view ELF header of an elf binary. Most Important fields include -

ELF Header
  • :The 1st 4 bytes of a file are enough to identify a it as an ELF binary, i.e. the magic number should be 0x7f (EI_MAG0), 0x45 (EI_MAG1 or), 0x4c (EI_MAG2or) and 0x46 (EI_MAG3or). Some more useful bytes are —
    → EI_CLASS (5th byte) defines the machine and virtual address space width, the binary is compiled for. MACROS defined are — ELFCLASSNONE, ELFCLASS32 and ELFCLASS64 (used in above example).
    → EI_DATA (6th byte) defines the data encoding (i.e. in what order is the data organized in the binary). MACROS defined are — ELFDATANONE, ELFDATA2LSB (used in above example), ELFDATA2MSB.
  • :It is the at which the code execution will start after the binary (along with all its dependencies) is loaded and all its references (relocation entries) are fixed up. This location nowadays is mostly in the form of an offset but can also be in the form of an absolute virtual address (as historically with executable ELFs). This can be used as one of the infection points that we’ll implement while hijacking control-flow of the binary.
  • :Tells whether the executable is of Shared object, Relocatable, Executable or of Core type.
  • :This describes the processor-architecture for which the binary is compatible with. Our cute little parasite code (the code we’ll inject into host binary) will be written according to what processor architecture is going to run the host binary.
  • : This field in ELF Header stores the offset in file (on disk) at which the first byte of Section Header Table is placed, i.e. it tells us how far is SHT placed from the beginning (1st byte) of the binary.
  • : This field in ELF Header stores the offset in file (on disk) at which the first byte of Program Header Table is placed (it is usually set to 64 since PHT is stored just after ELF Header).

Other fields in ELF header including those which are useful to compute size of SHT/PHT will be needed while implementing SHT/PHT parsing functionality and therefore will be explained later in the article series.

EPILOGUE

Consider this as a basic introduction to the world of Linux viruses. Later in the series we’ll see how all this information can programmatically be accessed, extracted and tampered with to craft magical spells on system programs.
The entire source code for the elf infector can be found here.

Cheers,
Abhinav Thakur

Github : https://github.com/compilepeace
Linkedin : https://www.linkedin.com/in/abhinav-thakur-795a96157/

software security researcher