

# ThundaTag: Disparate Domain Tagging to Enforce Benign Program Behavior By Shibo Chen and Alex Kisil

### Motivation

Many attacks rely on the ability of forging pointers Solutions: Rule-based Systems, Finer-grained encryption, Memory Space Randomization etc.





## Methodology

 Architectural support to impose strict checks and propagation rules on instructions, i.e. the program should only jump on code pointers, and not data pointers or data





## Methodology Ctnd.

- Disparate Domains
  - 6 domains: code, data, code pointer, data pointer, return, null (4 bits for easy alignment)





### Rules for Control Flow and Arithmetic Operations

- Only code can be executed.
- Can only jump/branch on code pointers.
- Can only return on return type.
- Return can only be returned. Any other operation is an exception.
- NULL can not be accessed under any circumstances.



#### Propagations Rules in ALU and Memory

| в                        | A  |     | -  | -     | -      | -                         | в        | A  | -   | -      | -   | -  | В                          | A        | D   | DP      | С   | CP  |                      | в        | A   | D     | DP    | С      | CP  |
|--------------------------|----|-----|----|-------|--------|---------------------------|----------|----|-----|--------|-----|----|----------------------------|----------|-----|---------|-----|-----|----------------------|----------|-----|-------|-------|--------|-----|
| _                        | -  |     | 0  | 0     | 0      | 0                         | _        | -  | 0   | 0      | 0   | 0  |                            | D        | 111 | D       | 111 | 111 |                      |          | 5   |       | D     | 111    | 111 |
|                          | -  | (   | 0  | 0     | 0      | 0                         |          | -  | 0   | 0      | 0   | 0  |                            | DP       | !!! | DP      |     | 111 |                      | D        | P   | !!!   | DP    | 111    | 111 |
|                          | -  | 0   | 0  | 0     | 0      | 0                         |          | -  | 0   | 0      | 0   | 0  |                            | С        | !!! | 111     |     | 111 |                      | (        | c   | !!!   | 111   |        |     |
|                          | -  | (   | 0  | 0     | 0      | 0                         |          | -  | 0   | 0      | 0   | 0  |                            | CP       | !!! | CP      | !!! | 111 |                      | C        | P   | !!!   | CP    | 111    | 111 |
| Store (overriden)        |    |     |    |       |        | Non-Store ops (overriden) |          |    |     |        |     |    | Load (B = MEM[RS1+offset]) |          |     |         |     |     | Store                |          |     |       |       |        |     |
|                          | DE | ST: | ME | M[RS1 | + offs | et]                       |          |    | DE  | EST: R | D   |    |                            |          | DE  | EST: RI | C   |     |                      |          | DES | T: ME | M[RS1 | + offs | et] |
|                          |    |     |    |       |        |                           |          |    |     |        |     |    |                            |          |     |         |     |     |                      |          |     |       |       |        |     |
| в                        | A  | . [ | D  | DP    | С      | CP                        | в        | Α  | D   | DP     | С   | CP | В                          | Α        | D   | DP      | С   | CP  |                      | в        | Α   | D     | DP    | С      | CF  |
|                          | D  | [   | D  | DP    | 111    | !                         |          | D  | D   | DP     | !!! | !  |                            | -        | D   | DP      | 111 | CP  |                      | I        | D   | !     | !     | 111    | !   |
| DP                       |    | D   | )P | !     | 111    | !                         | DP       |    | DP  | D      | !!! | 1  |                            | -        | D   | DP      | 111 | CP  |                      | D        | P   | !     | !     | 111    | !   |
|                          | С  | 1   | !! | !!!   | 111    | !!!                       |          | С  | 111 | !!!    | 111 |    |                            | -        | D   | DP      | !!! | CP  |                      | (        | С   | !!!   | 111   | 111    | 111 |
| (                        | CP |     | !  | !     | 111    | !                         | C        | CP | !   | !      | 111 | 1  |                            | -        | D   | DP      | 111 | CP  |                      | C        | P   | !     | !     | 111    | !   |
| Reg-Reg Arith (ADD only) |    |     |    |       |        | Reg-Reg Arith (SUB only)  |          |    |     |        |     |    | Immed Arith                |          |     |         |     |     | Any Arith (Overflow) |          |     |       |       |        |     |
| DEST: RD                 |    |     |    |       |        |                           | DEST: RD |    |     |        |     |    |                            | DEST: RD |     |         |     |     |                      | DEST: RD |     |       |       |        |     |



How do we generate tags?

We can use LLVM to statically analyze the codes, determine data types and generate tags for each 32/64 bit.

Out of scope for this project



### Architectural Design

Extend register files

Tag determination in ALU

Tagging in writeback and forwarding logic

Tagging in memory hierarchy

Raise exception when rules violated

New instructions for debugging and flexibility







### Memory Hierarchy Tag Propagation

**DRAM:** Tags are stored in a space in DRAM that is *non-addressable by software*. In this partition, tags are stored one after the other. This means a typical DRAM block of 64B can store 128 tags (at 4-bit tags).

**Tag Cache:** When reading from the DRAM, we also issue a request to fetch the corresponding tags for that block. Since reading that tag block will also contain tags *irrelevant to the first DRAM request*, we cache the extraneous tags in case spatial locality is realized in subsequent accesses. When write back to the DRAM, both data and tags will be written back.











### **Technical Details**

- Rocket Core
  - Open-source: Allows us to make add-ons without having to reinvent the base architecture
  - Simple: 5-stage pipeline makes debugging and analysis simpler
  - Parameterized: Chisel allows us to easily change hardware module configuration to meet our needs





### New Instructions

SETTAG \$r1, CP Tag\_Reg[1] = CP

CMPTAG \$1, CP If(Tag\_Reg[1] != CP) Raise Exception

Requirements:

- 1. No collision with other instructions.
- 2. Fit in 5 stage pipeline.
- 3. Handle hazards and bypass.

Trick:

Format and decode the instruction similar to ADD \$1, \$0, \$1. Then the original rocket pipeline will handle bypass logic for us.



## Analysis: Performance

**Gem5** Simulation

8 kB icache 32 kB dcache

2 kB tag cache

1 GHz

No L2 cache

64 bit in order core



#### 3% Performance Overhead on Average



## Analysis: Area

• Synthesis Configuration

4 kB icache

16 kB dcache

2 kB tag cache

No floating point unit

32 bit 5 stage in order pipeline

• Resulting in ~0.35% area overhead



### **Discussion & Limitations**

#### **Discussion:**

- Using larger tag cache or multi-layer cache hierarchy will likely decrease tag miss rate and thus performance overhead.
- More fine-grained rules and domain categories will lead to higher security and less false positives.

#### Limitations:

- Didn't fully verify the design and implementation.
- Area evaluation is not on a full implementation, so the area overhead will only go up.



#### Conclusions

- We implemented architectural support to include tag propagation and rule checking in the rocket core pipeline.
- We added new instructions for the debugging purpose and also gives programmers more flexibility.
- We evaluated the full-scale system by simulation and synthesis. The results show that such design has low performance and area overhead.



## Q&A

