Shellcoding 0x4: Packing Presents with Evil Santa

6 min readJul 27, 2022

One’s reality might be another’s illusion — Itachi Uchiha

The area of malware research has grown on both sides of offence and defence. Since the early age of Antivirus software till the advent of EDR solutions, detection & evasion has been a continuous cat & mouse chase. To adversaries — while delivering payload at the target endpoint, getting detected/flagged by statically applied filters or filesystem monitoring events (see fanotify(2)) would be the last thing an attacker wants. An adversary/malware operating solely in memory (volatile storage) has an implicit advantage maintaining minimum digital footprint for forensic/post-mortem analysis (assuming the execution doesn’t crash and core dumps are disabled on target platform). This article talks about the idea of crafting a packer (in-memory runtime decoder stub) which can be used to bypass heuristics as well as signature-based detection mechanisms while executing solely in-memory. Approach described here can effectively be applied to conceal certain sequence of bytes or eliminate bad characters from the payload.

Even though the article series is written in below mentioned chronological order, feel free to skip to whatever interests you more.

Context First

Almost all exploitation attempts share a common ground, i.e. the boundary between interpretation of CODE and DATA is ambiguous (see von Neumann architecture which is still the foundation of many modern computers). Wherever one could find such ambiguity, there is a possibility of code execution (see harvard architecture which has clear distinctions for code & data). Since shellcode is frequently delivered in the form of DATA (which can be analyzed as a stream of raw bytes), filters can be deployed to detect code (i.e. CPU instructions) being communicated/transmitted in the form of DATA and hence infection/exploitation attempts can be detected. Below diagram depicts a filter placed during shellcode delivery to detect shellcode.

Due to the above scenario, an attacker may frequently want to craft a shellcode free from certain byte sequences (depending on the target software being exploited or the policies/filters being bypassed). A filter policy could be detecting a particular strain of shellcode or ensuring that the user-supplied data should not contain any code pattern. A code pattern can be detected by scanning for byte sequence pertaining to syscall (0f 05), int 0x80 (cd 80) or sysenter (0f 34) instructions which are used to invoke system calls (an interface that almost every meaningful piece of code will use).

It is hard to imagine doing anything useful without system calls. Therefore, we are left with hiding our intentions. To evade filters placed on user supplied data, one could use arbitrary bytes in place of flagged byte sequence and modify itself at those specific labels to attain its original form during runtime. There can be various ways to approach such implementation (one should try combining arithmetic and bitwise operations creatively).

Packing the payload body

For the purpose of this article, rather than obfuscating a particular sequence of bytes, we would like to pack the whole payload body and have a decoder stub carry the packed payload. Upon execution, the decoder stub unpacks payload body in memory and transfers control to it. The below diagram depicts the scenario.

To obfuscate payload body, we simply use a property of XOR operation, i.e.

(a ^ k = b) implies (b ^ k = a)

a → original payload body
k → key
b → packed payload

Below is how we can quickly write an encoder to obfuscate our payload with key byte 0xbc.


#define KEY_BYTE 0xbc
...
    for (unsigned int i = 0; i < shellcode_length; ++i) {
        shellcode[i] ^= KEY_BYTE;
    }
...

The above highlighted sequence of bytes represents the packed payload body which wouldn’t make sense to the detection programs until it gets to its original form.

Santa bought you some presents

Since we have a XOR encoded payload body, its time to craft a corresponding decoder stub. Below mentioned code does it gracefully —

The above code uses a jmp-call-pop sequence to retrieve address of encoded shellcode (Line 13–37–20) into register r8.

main (Line 20–23): Initialises loop counter to 0 and sets length of encoded payload in eax.
again (Line 30–34): a loop code construct which iterates over each byte of encoded payload and XOR’s it with key value of 0xbc. The loop iterates ecx number of times (equal to the length of encoded payload).
decodeMe (Line 40): byte sequence here corresponds to the previously generated XOR encoded shellcode.

NOTE: For above shellcode to execute without a crash, the memory segment at which code injection is achieved must have RWX permissions on it. If this is not the case, one would need to first allocate such memory region, perhaps via mmap(2) as described here.

A Closer Look Inside GDB

Let’s observe the shellcode under Gdb to confirm it doesn’t carry any bad character or byte sequence which might get flagged by target defense.

Above, we see how packed shellcode body looks like in memory.

Above, we set a conditional breakpoint (after counter attains a value of 64) such that the code flow breaks as soon as it finishes iterating over (decoding) every byte at decodeMe label.

Finally, we see the disassembled payload body in its original form. This is a dummy payload does nothing more than printing a message to STDOUT and exiting with a return value of 7.

Land of Detection

Signature-based detection does fail on payload body but certainly applies to decoder stub (at least for the above case). Writing a signature (check out YARA rules) for decoder stub can effectively be used to detect a sample using such logic in the wild. However, a mutation engine aka MTE (search for polymorphic, oligomorphic, metamorphic engines) might render the signature-based detection system entirely useless due to having different body structure on each propagation. Therefore, its best to rely on behavioural analysis via manual human intelligence or automated analysis under a sandbox process (dynamic analysis), keeping in mind a simple fact that actions never lie. Since every packed sample would need to unpack itself in memory before it can be executed by the CPU, scanning target’s process address space for malicious artefacts could be one way to deal with evil memory encoders.

EPILOGUE

This article talks about how detection software might identify a code injection & execution scenario while maintaining a minimum digital footprint by operating solely in-memory. Further, it discusses an approach to craft a packer which allows an attacker to conceal/eliminate certain sequence of bytes while preserving actual behaviour effectively bypasssing signature-based detection mechanism. At last, it briefly discusses ways of detecting such techniques being used in the wild. This is it for the day, I hope this article contributes to someone’s journey into this area.

DISCLAIMER — Since the attackers are already making use of this knowledge, it’s the defenders who might find any value to the approach mentioned in this paper. This article series is intended for exploit developers, malware researchers, folks indulged in red/blue team operations and independent researchers struggling to find relevant resources into this area. The content is intended to be used solely for educational purposes. Therefore, it doesn’t take responsibility for anyone attracting hell by carrying out malicious intentions. Happy hacking ×_×

Cheers,
Abhinav Thakur
(a.k.a compilepeace)