@ Cryptape
2025-02-13 09:00:41
Security is fundamental to any blockchain. It ensures that all tokens are secure. When talking about a virtual machine and the smart contract platform it forms, security comes in two main aspects:
* The code running on the virtual machine must be secure
* The virtual machine itself should also be designed to facilitate safer code execution
The first aspect often gets sufficient attention. When it comes to [CKB](https://www.nervos.org/), we now encourage developers to write scripts in Rust for maximum security, reserving pure C code only for those who fully understand its risks. Additionally, higher-level languages have been introduced in CKB to strike a better balance between productivity and security.
Virtual machine security was a major focus when CKB-VM was originally designed. Many potential risks were addressed at the architectural level, though some—despite thorough research—were still left open. One such issue is **Return-Oriented Programming (ROP)**—a rather ingenious attack. It exploits executable code that has been legitimately loaded into memory, making widely effective protections (e.g., [W^X](https://yakihonne.com/write-article)) futile. It spans multiple architectures and is constantly evolving. Although we’ve spent a great deal of effort in the early days on ROP, we did not implement specific countermeasures to prevent it. Now, with new RISC-V extensions now available, it is the perfect time to introduce design-level protections against ROP.
## Acknowledgments
Before diving deeper, we would like to acknowledge Todd Mortimer from the OpenBSD team. His work on ROP mitigations at the OpenBSD kernel in 2018-2019 significantly inspired our research and this article. We highly recommend his [talk](https://www.youtube.com/watch?v=ZvSSHtRv5Mg), slide decks from [AsiaBSDCon 2019](https://www.openbsd.org/papers/asiabsdcon2019-rop-slides.pdf) and [EuroBSD 2018](https://www.openbsd.org/papers/eurobsdcon2018-rop.pdf), and this [paper](https://www.openbsd.org/papers/asiabsdcon2019-rop-paper.pdf) for a deeper understanding of ROP. Several examples on x64 ROP attacks in this post are also drawn from his research.
## Typical Attack Workflow
While there are many sophisticated [ways](https://www.wired.com/2016/06/clever-attack-uses-sound-computers-fan-steal-data/) of attacks, a common attack on a program typically follows this process:
Prepare a [shellcode](https://en.wikipedia.org/wiki/Shellcode)— a piece of binary code to perform specific actions (e.g., running a shell or other programs on the target computer).
Exploit one possible vulnerability in the target system, most commonly a [buffer overflow](https://en.wikipedia.org/wiki/Buffer_overflow) attack. The attack could be initiated via a network protocol (such as HTTP) against a remote system, or via command line input to a target program;
As the result of the attack, the shellcode is inserted to a designated memory region of the target system and gets executed, allowing the attacker to achieve their goal. The consequences vary, like gaining unauthorized access to sensitive data, destroying certain data/machine, planting malicious programs onto the target for further actions, manipulating control flow.
While traditional systems face a wide range of attacks, blockchains run in their own limited and unique runtime environment, rendering many conventional attacks irrelevant. Major blockchain security threats includes:
* **Private key security**: Blockchain wallets rely on private keys, which are prime targets for various attacks.
* **Smart contract vulnerability**: Poorly written smart contracts contain logic flaws that lead to security risks.
* **Virtual machine security**: Attacker may send malicious inputs to a smart contract, causing it to terminate unexpectedly with a success status—despite lacking proper credentials.
This post focuses specifically on attacks targeting the blockchain’s virtual machine—in our case—CKB Virtual Machine (CKB-VM) specifically.
## CKB’s Early Approach
While it is impossible to predict every attack, disrupting the typical attack workflow is an effective defense strategy. From its inception, CKB-VM has implemented [W^X](https://en.wikipedia.org/wiki/W%5EX) protection: at any given time, any memory location in CKB-VM is either writable (allowing data modification) or executable (allowing data execution)—but never both. Once a memory region is marked as **executable**, it cannot be reverted to **writable** throughout the lifecycle of the current CKB-VM instance. Only writable memory location can be **frozen** to executable.
This design significantly disrupts the typical attack workflow. For shellcode to execute on CKB-VM, it must reside in **executable** memory. However, an attacker can only provide shellcode as part of program inputs, which are loaded into **writable** memory. As long as a CKB script does not voluntarily mark input data as **executable** (a highly unlikely scenario), the shellcode remains inert. Additionally, attempting to overwrite existing **executable** shellcode is also futile, since executable memory region is unwritable, and cannot be converted back to writable.
This way, W^X is a well-established security technique widely used in modern hardwares, operating systems, and virtual machines. Although it cannot prevent all possible attacks, W^X effectively shields many by breaking the standard attack workflow. Even if an attacker successfully injects shellcode into a target machine, the attack is incomplete due to the inability to execute it.
## Understanding ROP
While W^X is effective, it does not solve all our problems. This leads to the topic of this post: **Return-oriented Programming (ROP)**. Instead of explicitly injecting new code, ROP exploits executable code that already resides in the target machine’s memory. Essentially, ROP builds a shellcode by chaining existing code snippets together that were never intended to function as such. It may sound like a fantasy, but as we shall see from the following examples, ROP is a practical and effective attack technique.
To understand ROP, we must first examine modern CPU architecture and memory layout. While assembly instructions vary in representations and lengths, they are put together in memory one after another as a stream of bytes:
![image](https://yakihonne.s3.ap-east-1.amazonaws.com/3eba5ef41b206b1d49e8b1be7241d7ead5dd87b737261da64c3bc0e2751f23ae/files/1739326040178-YAKIHONNES3.png)
*Image [Source](https://www.quora.com/Is-assembly-language-a-source-code-or-object-code)*
As seen in the above example, different assembly instructions come in different lengths. For x86-64 ISA, an instruction can range from 1 to 7 bytes (RISC ISAs such as ARM or RISC-V have more uniform instruction lengths—we will discuss it later). But in memory, instructions are stored sequentially without gaps.
This means that with a stream of bytes alone, we really don’t know what instructions the stream of bytes consist of. In the above example, meaningful assembly instructions emerge only when we start decoding from the `B8` byte. In a different occasion, assuming we know elsewhere that `B8 22 11` bytes at the front are for certain magic flags, the decoding would start from `00` byte, yielding a totally different instruction set.
![image](https://yakihonne.s3.ap-east-1.amazonaws.com/3eba5ef41b206b1d49e8b1be7241d7ead5dd87b737261da64c3bc0e2751f23ae/files/1739326082274-YAKIHONNES3.png)
*Image [Source](https://www.quora.com/Is-assembly-language-a-source-code-or-object-code)*
It is really the combination of a special program counter (`PC`) register from the CPU and the current memory stream, jointly determine the instructions the CPU executes. Depending on each different ISA or hardware, a booting process initializes a CPU’s `PC` register to a pre-defined value, then loads up instructions from this pre-defined address, and initializes everything related to the operating system. When a user launches a new program, the metadata for each program will contain an `entrypoint` address, where OS sets the CPU’s `PC` register to, in order to start executing the program. It is suffice to say that maintaining a proper `PC` value is a critical job to ensure a computer’s proper function. An invalid `PC` value might lead to a CPU malfunction at best, or at worst, leaking sensitive information or granting attackers unauthorized access.
### Forming an ROP Attack Via ROP Gadgets
Let’s look at the following byte instruction stream in a x86-64 CPU:
```
8a 5d c3 movb -61(%rbp), %bl
```
This 3-byte represents a `mov` instruction: it takes the address of `rbp` register, adds an offset of `-61`, then uses the result as a memory address to load 1 byte data, and finally sets the loaded data to `bl` register. However, if we ignore `8a` and only look at `5b c3` here, it actually represents a different instruction set:
```
5d popq %rbp
c3 retq
```
This byte sequence contains two instructions:
* Pop 8-byte value from stack, and use it to set `rbp` register
* Pop 8-byte value from stack, and use it to set `PC` register, so we continue executing from the new location
We've briefly discussed that shellcode only fulfills a certain task required by the attacker. In fact, the most common type of shellcode simply construct a new shell, where the attacker can execute more operations. Such shellcode can be represented in the following C pseudocode to run a new command via the `execve` syscall:
```
execve(“/bin/sh”, NULL, NULL);
```
To execute this on an x86-64 CPU, the following actions are needed for a syscall:
* `rax` register: must contain the syscall number, for `execve`, it is `59`
* `rdi`, `rsi`, `rdx` registers: hold the first 3 arguments to the syscall. In this case, `rdi` holds a pointer to the C string `/bin/sh`; `rsi` and `rdx` must be zero.
* The `syscall` instruction (or typically `int 80h` on x64) shall be executed
A typical shellcode would be a packed assembly sequence directly doing all of the above instructions. In contrast, ROP attack looks for the following sequences:
```
# Those can set the value of different registers from values on the stack
pop %rax; ret
pop %rdi; ret
pop %rsi; ret
pop %rdx; ret
# Finally, trigger the syscall
syscall
```
Each of these small code sequences, are conventionally callled **ROP gadgets**. An attacker searches for these gadgets in the target program or system libraries (such as libc). Once these required gadgets are obtained, the attacker pieces together a sequence of data, much like the following:
![image](https://yakihonne.s3.ap-east-1.amazonaws.com/3eba5ef41b206b1d49e8b1be7241d7ead5dd87b737261da64c3bc0e2751f23ae/files/1739326764236-YAKIHONNES3.png)
With the prepared data sequence, the attacker can exploit a vulnerability in the target computer or program, such as typical buffer overflow attack. During this process, the attacker performs three key actions:
* Pushes (or overwrites existing data) the crafted data sequence to the stack
* Sets the stack pointer (top of the stack) to `X + 64`
* Sets the `PC` register to the address of a code sequence, `pop %rax; ret` in the existing program or libc memory space
Now the attack proceeds step by step as follows:
1. The CPU runs `pop %rax; ret`. With the stack pointer pointing to `X + 64`, the CPU pops `59` from the stack and sets `rax` register to `59`. It then pops the address to code sequence `pop %rdi; ret` from the stack, and sets `PC` to this value;
2. The CPU runs `pop %rdi; ret`. With the stack pointer pointing to `X + 48`, the CPU pops value `X`, pointing to the C string `/bin/sh` from the stack, and sets `rdi` register to `X`. It then pops the address to code sequence `pop %rsi; ret` from the stack, and sets `PC` to this value;
3. The CPU runs `pop %rsi; ret`. With the stack pointer pointing to `X + 32`, the CPU pops `0` from the stack and sets `rsi` register to `0`. It then pops the address to code sequence `pop %rdx; ret` from the stack, and sets `PC` to this value;
4. The CPU runs `pop %rdx; ret`. With the stack pointer pointing to `X + 16`, the CPU pops `0` from the stack and sets `rdx` register to `0`. It then pops the address to code sequence `syscall` from the stack, and sets `P`C to this value;
5. The CPU runs `syscall`. At this point, `rax` holds `59`, `rdi` points to `/bin/sh`, and both `rsi` and `rdx` are zero, the CPU invokes `execve("/bin/sh, NULL, NULL);`, granting the attacker a shell for further manipulations.
This sequence of ROP gadgets, referred to as **ROP chains**, demonstrates how a complete ROP attack works. Two key takeaways are:
* **ROP does not inject new code**. Instead, it injects data into the stack and leverages the existing code loaded in memory and marked them as executable. W^X protections hence cannot prevent ROP attacks.
* **Attackers can mine ROP gadgets from the [libc](https://en.wikipedia.org/wiki/C_standard_library) library**. This is because modern computers employs [protection rings](https://en.wikipedia.org/wiki/Protection_ring) as a way for privilege encapsulations: on x86-64 computers, programs normally run at ring level 3, while libc runs at ring level 1. Lower ring levels have higher privileges, meaning that even if a program misbehaves, its capacities are limited at ring level 3. However, by using ROP gadgets in the libc library which runs at ring level 1, ROP attacks can have higher privileges and execute more damaging operations then normal shellcodes.
Note that the above examples simply show the most basic ROP gadgets. In reality, ROP gadgets come in all kinds of forms. Since they come from compiler outputs, they can be combined in the least expected way, and can vary the forms as new compiler optimizations come out. Numerous tools (e.g., [ropper](https://scoding.de/ropper/), [ropr](https://github.com/Ben-Lichtman/ropr)) and research papers (e.g., *[Experiments on ROP Attack with Various Instruction Set Architectures](https://dl.yumulab.org/papers/42/paper.pdf)*, *[ROPGMN](https://www.sciencedirect.com/science/article/abs/pii/S0167739X24005314)*, *[Detecting and Preventing ROP Attacks using Machine Learning on ARM](https://www.infosun.fim.uni-passau.de/ch/publications/compsac23.pdf)*, *[KROP](https://arxiv.org/pdf/2406.11880)* ) keep coming out, making it almost impossible to enumerate all possible ROP gadget combinations.
### ROP on ARM & RISC-V
ROP attacks are not limited to CISC architectures, where instructions vary in length. They also affect RISC designs, such as [ARM](https://developer.arm.com/documentation/102433/0200/Return-oriented-programming) and [RISC-V](https://pure.royalholloway.ac.uk/ws/portalfiles/portal/37157938/ROP_RISCV.pdf). Take the following sequence for example:
```
13 4f 83 23 0b 00
```
Decoding from the start, the first four bytes represent `xori t5,t1,568` following the RISC-V ISA. But if we skip the first two, the latter four represent `lw t2,0(s6)`. This illustrates that a byte stream interpretation also requires `PC` register in a RISC design such as RISC-V. As a result, one can find ROP gadgets from a RISC-V program as well.
### ROP on CKB-VM
CKB’s RISC-V machine operates in a more restricted environment: for programs running on CKB, there are no `execve` syscalls to hijack a running shell, and all runtime states are publicly visible on a public blockchain like CKB. However, ROP attacks can still occur on CKB: one could construct an ROP chain that sets `a0` to `0`, `a7` to `93`, then executes `ecall`. This causes CKB-VM to immediately return with a success code (`0`), potentially allowing a script to pass validation when it should have failed—such as a lock script succeeding without a valid signature.
### Short Recap
Let’s briefly recap what we’ve learned so far:
* ROP attacks utilize existing executable code for malicious purposes. W^X cannot prevent ROP.
* ROP is possible across multiple architecture, including x86-64, ARM, RISC-V, and CKB.
* The landscape of ROP is constantly evolving. With new tools, techniques, and research emerging regularly, it’s impossible to foresee all ROP gadgets.
ROP has been extensively studied over the years, leading to various mitigation strategies, which can be broadly categorized into two main approaches:
* **Software Solutions**: Covering techniques like rewriting code sequences and implementing Retguard to prevent the creation of ROP gadgets
* **Hardware Solutions**: Introducing additional CPU instructions with Control Flow Integrity (CFI) checks to safeguard control flow.
I’ll explore these strategies in greater detail in the following sections.
## Software Solutions to Mitigate ROP
### Rewriting Sequence
Certain instruction sequences are often targeted to form ROP gadgets. To prevent ROP, one approach is to alter the compiler, so that such sequences can never be generated. Take the following example:
```
89 c3 mov %eax,%ebx
90.
```
In x86-64, `c3` represents the `ret` instruction, making it a potential target for ROP gadgets. We can rewrite it into the following equivalent sequence:
```
48 87 d8 xchgq %rbx, %rax
89 d8 movl %ebx, %eax
48 87 d8 xchgq %rbx, %rax
```
The new sequence lacks `c3` byte at the expense of more bytes and more executed instructions. However, it is really up to real benchmarks to see if this causes noticeable overhead.
Further analysis has revealed that the `rbx` register in x86-64 is often the source of ROP gadgets, due to the way Intel encodes x86-64 instructions. Hence, the OpenBSD team decided to avoid `rbx` register wherever possible, reducing the number of potential ROP gadgets.
Again, this approach comes at the cost of having bigger code fragments, more instructions to execute, and an additional patched compiler. While OpenBSD has integrated these changes into its distribution, other environments must weigh the benefits against the costs.
For a deeper dive, I would strongly recommend Todd Mortimer’s [work](https://www.openbsd.org/papers/asiabsdcon2019-rop-slides.pdf).
### Retguard’s Solution: Prologue and Epilogue
Todd Mortimer also introduced Retguard in this [work](https://www.openbsd.org/papers/asiabsdcon2019-rop-slides.pdf) for securing OpenBSD known. ROP attacks typically occur when you enter a function `foo`, but the stack was manipulated, so the CPU exits to another code fragment that is not `foo`. What if to verify that, at each function exit, it is the same function for exiting and entering?
Retguard introduces two components to perform this task:
* **Prologue**: A prologue is inserted to each function’s entry, taking two inputs:
- - A `cookie` value, a random data assigned for this particular function.
- - The return address, where to jump to when current function exits—as inputs.
The prologue computes the XOR value of these two, and stores the result into the current function’s [frame](https://en.wikipedia.org/wiki/Call_stack#Stack_and_frame_pointers) section, a dedicated memory region designated to the current function to hold data, separated from the stack.
* **Epilogue**: An epilogue is inserted to the location where a function might exit. It takes two inputs:
- - The saved XOR value from the prologue in the frame section
- - The return address it now can access to (most likely popped from the stack in x64 machine, or read from a special `RA` register in RISC design)
The epilogue computes the XOR of these two. If the result matches the original `cookie`, execution proceeds. Otherwise, the epilogue halts the program, signaling an error.
This prologue-epilogue mechanism in Retguard guards the call stack from tampering. At a noticeable but acceptable cost (both in performance and code size), Retguard eliminates a significant number of ROP gadgets from the OpenBSD kernel. Like other software-based mitigations, it requires a patched compiler, and it is up to each environment to decide if such technique shall be employed.
## Hardware Advancements to Mitigate ROP
In addition to software solutions, hardware-based defenses have also been developed. For instance, Intel has introduced [Indirect Branch Tracking](https://edc.intel.com/content/www/us/en/design/ipla/software-development-platforms/client/platforms/alder-lake-desktop/12th-generation-intel-core-processors-datasheet-volume-1-of-2/007/indirect-branch-tracking/?language=en) feature starting with its 12th generation core processors, using a new instruction `endbr32` or `endbr64` added at every location the program might jump to or call into. When the CPU executes a jump/call, it asserts that the target location is a proper `endbr32` / `endbr64` instruction, before updating the program counter `PC` register to proper values. Otherwise, the CPU halts to terminate the program. This ensures that all control flows will follow the intended way, preventing ROP attacks from redirecting execution arbitrary locations.
Modern OSes have already extensively leveraged `endbr32` / `endbr64` instructions. Ubuntu 24.04, for instance, has included these instructions in its packages:
```
$ objdump -d /bin/bash | head -n 50
/bin/bash: file format elf64-x86-64
Disassembly of section .init:
0000000000030000 <.init>:
30000: f3 0f 1e fa endbr64
30004: 48 83 ec 08 sub $0x8,%rsp
30008: 48 8b 05 d9 7e 12 00 mov 0x127ed9(%rip),%rax # 157ee8 <__gmon_start__@Base>
3000f: 48 85 c0 test %rax,%rax
30012: 74 02 je 30016 <unlink@plt-0xe1a>
30014: ff d0 call *%rax
30016: 48 83 c4 08 add $0x8,%rsp
3001a: c3 ret
Disassembly of section .plt:
0000000000030020 <.plt>:
30020: ff 35 a2 76 12 00 push 0x1276a2(%rip) # 1576c8 <o_options@@Base+0x1cc8>
30026: ff 25 a4 76 12 00 jmp *0x1276a4(%rip) # 1576d0 <o_options@@Base+0x1cd0>
3002c: 0f 1f 40 00 nopl 0x0(%rax)
30030: f3 0f 1e fa endbr64
30034: 68 00 00 00 00 push $0x0
30039: e9 e2 ff ff ff jmp 30020 <unlink@plt-0xe10>
3003e: 66 90 xchg %ax,%ax
30040: f3 0f 1e fa endbr64
30044: 68 01 00 00 00 push $0x1
30049: e9 d2 ff ff ff jmp 30020 <unlink@plt-0xe10>
3004e: 66 90 xchg %ax,%ax
30050: f3 0f 1e fa endbr64
30054: 68 02 00 00 00 push $0x2
30059: e9 c2 ff ff ff jmp 30020 <unlink@plt-0xe10>
3005e: 66 90 xchg %ax,%ax
30060: f3 0f 1e fa endbr64
30064: 68 03 00 00 00 push $0x3
30069: e9 b2 ff ff ff jmp 30020 <unlink@plt-0xe10>
3006e: 66 90 xchg %ax,%ax
30070: f3 0f 1e fa endbr64
30074: 68 04 00 00 00 push $0x4
30079: e9 a2 ff ff ff jmp 30020 <unlink@plt-0xe10>
3007e: 66 90 xchg %ax,%ax
30080: f3 0f 1e fa endbr64
30084: 68 05 00 00 00 push $0x5
30089: e9 92 ff ff ff jmp 30020 <unlink@plt-0xe10>
3008e: 66 90 xchg %ax,%ax
30090: f3 0f 1e fa endbr64
30094: 68 06 00 00 00 push $0x6
30099: e9 82 ff ff ff jmp 30020 <unlink@plt-0xe10>
3009e: 66 90 xchg %ax,%ax
```
The `endbr32` / `endbr64` instructions has been carefully designed, so they are `nop` instructions—meaning they can do nothing at all—on CPUs prior to their introductions. Having them doesn't have any effect on older CPUs but enhances security on supported hardware.
## RISC-V’s Latest Achievements: CFI Extension
The above mitigations against ROP fall into two categories:
* **Compiler Modifications**: Can generate more secure binary assembly code.
* **Additional CPU instructions**: Coming with Control Flow Integrity (CFI) checks to prevent exploitation
Back to the beginning of designing CKB-VM, we throughly studied ROP and recognized that a vulnerability in a CKB script could potentially open the door to ROP attacks. However, we eventually did not introduce any specific mitigation against ROP in CKB-VM. Our decision was to stay aligned with the RISC-V ecosystem, avoiding shipping any custom RISC-V spec with additional instructions that would require a patched compiler. Nor do we want to maintain our own compiler set, eliminating the potential that any RISC-V-compliant compiler shall be able to produce CKB script. As the result, we shipped the first version of CKB-VM without ROP mitigations, but that does not mean we’ve ignore this issue:
* We’ve [reached out](https://groups.google.com/a/groups.riscv.org/g/isa-dev/c/LyZzjJG7_18/m/pCSnKZPdBAAJ) to the RISC-V community for possible extension similar to Intel’s solution, and kept monitoring advancements in this field;
* We’ve been watching over progress and writing secure CKB scripts. Since ROP relies on existing vulnerabilities, secure CKB scripts can kept ROP purely theoretical.
We were thrilled when the **RISC-V CFI (Control-Flow Integrity) Extension** was officialy [ratified](https://github.com/riscv/riscv-cfi/releases/tag/v1.0) in July 2024. Designed by the brilliant minds from the RISC-V Foundation, this extension directly addresses ROP attacks with two key features:
* `Zicfilp` extension introduces **landing pad**: Resembles Intel’s `endbr32` / `endbr64` to ensure that the CPU can only jump to valid, permitted targets.
* `Zicfiss` extension introduces **shadow stack** with a series of instructions:. Offers a hardware solution similar to Retguard, where CPU ensures the control flow integrity or simply puts the call stack, preventing tampering throughout execution.
Together, these features offer the state-of-the-art mitigations against ROP. More importantly, RISC-V CFI is now an official extension, meaning all future RISC-V CPUs, compilers, and tools will support this extension. In fact, LLVM 19 has already [supported](https://releases.llvm.org/19.1.0/docs/ReleaseNotes.html#changes-to-the-risc-v-backend), and I believe other compilers and tools will follow soon.
Once fully adopted, CKB script developers can simply turn them on like a switch during code compilation. Without modifying the code, they can enjoy the security provided by RISC-V CFI extensions. Even if a vulnerability exists in a CKB script, these built-in enforcements can prevent it from being exploited.
## Final Words
Security is complex. While we strive for maximum security, certain design principles might get in the way from introducing specific mitigations. ROP is a prime example: while we did learn much about it early on, implementing the best mitigations needs proper timing. Now the time has come. We are happy to introduce RISC-V CFI in CKB’s next hardfork, bringing stronger security for everyone.
***
✍🏻 Written by Xuejie Xiao
His previous posts include:
* [A Journey Optimizing CKB Smart Contract: Porting Bitcoin as an Example](https://blog.cryptape.com/a-journey-optimizing-ckb-smart-contract)
* [Optimizing C++ Code for CKB-VM: Porting Bitcoin as an Example](https://blog.cryptape.com/optimizing-c-code-for-ckb-vm)
Find more in his personal website [Less is More](https://xuejie.space/).