Select your country

Not finding what you are looking for, select your country from our regional selector:

Rechercher

Statically encrypt strings in a binary with Keystone, LIEF and radare2/rizin

Author: Vladimir Meier / @plowsec

In our journey to try and make our payload fly under the radar of antivirus software, we wondered if there was a simple way to encrypt all the strings in a binary, without breaking anything. We did not find any satisfying solution in the literature, and the project looked like a fun coding exercise so we decided it was worth a shot.

By the end of it, we succeeded partly, and realised that the approach is not directly suited for antivirus evasion, as this tool’s limitations do not allow antivirus bypass on its own. That’s why we then made avcleaner, which operates on source code directly.

Still, the tool presented in this blog posts brings in some binary hacking that we believe might be of some value to the community, and who knows, someone might end up doing something useful with it.

Currently, we plan to use it along another antivirus bypass tool in order to better target the strings to be encrypted.

General idea

Our idea was to encrypt in place all the strings in PE file. To avoid breaking the software, it is obviously mandatory to allow decryption of the string as soon as it is needed. For that to work, one should inject a decryption routine within the binary, and somehow call it when the string is used.

The best approach would be to decompile the binary, locate strings usages and wrap them in a decryption routine. However, frameworks such as ret-dec, rev.ng, mcsema and so on were not mature enough at the time.

In view of that, our solution relies on lief for the binary manipulation, radare2 / rizin for the program analysis, and keystone for code injection.

The process is as follows:

  1. Enumerate and encrypt strings with radare2
  2. Locate cross-references to each of these strings, also with radare2
  3. With gcc, build a decryption routine as Position Indepent Code (PIC)
  4. With lief, carve out this decryption routine and inject it in the target binary as a new section
  5. For each xref, patch the instruction that loads the strings in registers, the stack or whatever
  6. Insert a call instruction to hijack the execution flow and divert it to the decryption routine
  7. Return to the original instruction

These last steps require storing the string’ size and the return address, so we use lief as well to build a kind of jump table.

Here is an artistic diagram for clarity:

Implementation

This section shares the implementation details and demonstrates the use of keystone, lief and radare2 to accomplish our goal.

Enumerate strings

Strings can be enumerated with the iz command of radare2.

Encryption

For each recovered string, we should encrypt it in place and build the corresponding jump table (described in the subsequent sections).

The “encryption algorithm” for this Proof-of-Concept is actually a simple Vigenere:D, but you can roll your own crypto obviously. Luckily for us, antivirus can be fooled with Vigenere, so let’s not waste time on this.

Patch the cross-reference

Get cross-references

Cross-references to strings can be obtained with r2pipe’s axt command. Appending a j to the command and then using cmdj allows to get the result in the JSON format, and then automatically parse it with Python.

To simplify things, we do not handle strings with many xrefs although that’s definitely doable.

Disassemble the original instruction

Insert the hook

Build the jump table

First, we need to create a new section in the target binary. The section should be big enough to hold information about each identified string.

Insert a new section

Then, we use keystone to assemble the hook instructions, but let’s go over the process step-by-step.

Trampoline

Assembly

Our trampoline should look as follows:

However, this does not account for the calling convention of the target binary, and sadly there are too many variations to cover. We thus decided to only support 64-bit ELF and PE files as a first step.

This sets up the parameters required by the decryption routine, the actual call and then the return to the original instruction. With that out of the way, let us define the blueprint for this trampoline. For a PE file, our actual trampoline would actually be:

Collect virtual addresses

Load the string in rdi

Load the string

Call the decryption routine

Load the original instruction and restore the original control flow

Then, it is important to recover the original register used to reference the string, and update its value with the string’s new address:

Now, it is simply a matter of returning to the original instruction. The final code can be assembled with keystone.

Update the binary with these patches

Generate the decryption routine

Binary carving and code injection

The goal here to locate the decryption routine previously generated and carve it out, and then inject it into the target binary.

To carve it out, we will use symbols to locate the function by its name. For ELF files, the lief API get_static_symbol did the job, wheras it did not work for PE files. No worries though, using radare2 it is almost as easy. Then, lief offers the API get_content_from_virtual_addresss, which allows to copy the bytes making up the decryption routine.

Then, inject it as follows:

 

Results in practice

In practice, it is not possible to encrypt 100% of the strings in a binary:

  • Strings identification by the most advanced binary analysis frameworks is incomplete.
  • Cross-references are incomplete.
  • Strings may be declared within arrays, and such scenarios the cross-reference points to the beginning of the array.

So, while we could encrypt around 2000 strings within mimikatz, Windows Defender still detected the binary statically. It’s quite a shame to encrypt that many strings and miss the only 5 strings that actually trigger the detection, mais c’est la vie.

Future work

To improve this tool and allow it to actually circumvent antivirus software, more advanced analysis should be performed on the binary, in order to identify more cross-references and handle scenarios where a cross-reference points to a collection of strings rather than the string directly. There are some treasures in the floss codebase, and probably some of the problems they solved while making their tool could be helpful here as well.

Or, one can embrace the current limitations and only encrypt strings which are definitely going to trigger the antivirus, hoping they are not located within an array.

Réponse à incident

Vous faites face à une Cyber attaque ?

24/7/365 nos experts vous accompagnent réponse à incident.