Saphire: Sandboxing PHP Applications with Tailored System Call Allowlists

paper: https://www.usenix.org/system/files/sec21-bulekov.pdf

Abstract

像是 PHP、Python 都是直譯式語言，這些直譯式語言寫的 app 常常都會整個用同樣而且太多餘的權限跑，這就違反了 PoLP，也就是愈少權限愈好的原則，讓整個系統暴露在危險，只要被 RCE 就很容易直接被拿到太高的權限，暴露在更高的風險裡
這篇 paper 想做到的就是 PoLP，正確的控管權限，用的是他們提出的自動產生 syscall policy 給每個各別 program

Background

RCE
seccomp
- 指定 process 只能呼叫特定的 system call
ptrace
- 使得一個追蹤者程序能夠觀察或控制另外一個受追蹤者程序的內容（如記憶體或暫存器）或執行流程
- 主要被用來時做 debugger 或 trace syscall

Overview

Interpreters

Interpreted program : 需要 interpreter 來執行的程式
Interpreter 可以直接被 OS 和硬體執行

所以 interpreted program 是可攜帶的，通常 interpreted program 會提供由數個 function 組成的 API 給使用者，而只有這些 API 能夠碰到 syscall，即使經過 JIT，也是會經由 API 才能 call 到 syscall

An API for all interpreted programs

因為這些 API 通常都要很泛用，一次通常也會用到許多 syscall，所以單純的 filter syscall 是沒有意義的，不過一個 program 可能只會用到一小部分的 syscall，這種時候 filter 這招就可以派上用場了

像是 Prog1 不需要用到 syscall1，可以 filter 掉

Securing Interpreted Programs

Mapping the interpreter API to syscalls
用上靜態和動態分析來 map API function handler 到 syscall
Identifying API calls within an interpreted program
找出各個 code 裡面用到的 function，包括所有 dependencies (不管是 implicit 還是 explicit) 然後搭配前面的 mapping 結果產出 allowed list
Protecting the Program
把 allowed list 應用在 program 上

Implementation

上面提到的步驟，這裡會用 PHP 實作然後解釋

Mapping built-in PHP functions to system-calls

把 PHP functions mapping 到他用到的 system call

先用 static call-graph analysis 做初步 mapping
再用 ptrace 去觀察跑 PHP process 時用到的 syscall

Static analysis over the PHP Interpreter

用 symbols 來建構整個 interpreter 和 lib 的 static call-graph (每個 node 是個 function 然後每個 edge 是 direct function call)
為了找出所有 handler，用 get_defined_functions() 來找出所有 built-in PHP function handler 的地址

雖然這步是 static，但需要用很多 code-bases 跑過 (看過 code 這裡也說需要跑超過 30 分鐘)

Refining the mapping through dynamic analysis

雖然上一步已經暴搜了整個 PHP process 的 code，但是會漏掉 indirect calls，像是 PHP 的 fopen 可以去拿到 remote files，也有些 function 像是 mail() 會去執行 sendmail 這個 bin，單純靜態分析 map 不到 sendmail call 到的那些 syscall

TE 從 shared memory 中找出正在跑的 php function name
TR 利用 ptrace 來攔截 syscall

用這兩個組合起來就能動態的 map 到這種 function

Creating system-call filters for web apps

這個步驟的目的是找出 interpreter 會 invoke 哪些 PHP built-in function，然後和上一個步驟產生的 mapping 表對照生成這個 web app 中的每個 script 可能會 call 到哪些 system call

先去 iterate web app 中的所有 PHP files
- 用 php-parser 來生出 Abstract syntax tree (AST)
- 掃這個 AST 來找出可能的 built-in function call
- 檢查 assignment 來確認 class type，進而對應 function
找出 AST 中的 dependencies
- constant definitions
- class definitions/instantiations
- include/require operations

最費工的會是處理 dependency 的部分，若是單純的 include 就可以很輕鬆的處理，但是如果用上一些 variable 或是 constant 就會需要額外處理

String representation

會用各種方法對不同種類的 include string 轉換，直到不能處理或是處理完為止

(variable 如果被 assign 很多次就一一找出，若是不行就會把所有可能性都列出)

Unresolved Includes

到這裡 74% 的 include 可以解決，剩下的可以被 fuzzy resolved，也就是把一個小集合的內容都當作是有 include 到的。
有些 include 完全沒辦法從靜態分析中獲得任何資訊，這種時候他們有提供 Conservative Includes 的選項，用的就是這種方法，開了這個選項以後 FP 的機率會降低，但是 allowed list 中的內容就會變得比較多。

Building system-call profiles for Scripts

最後手上有了一個 script 和他的 dependency 會 call 的 PHP function 和前一步得到的這個 mapping 以後就能生出一個 script 對應到 syscall 的 profile 了 (script path <-> profile)

Sandboxing the Interpreter and Web App

利用 seccomp 來實作這個 allowed list，這個 SE extension 會在 process 跑起來的時候知道他跑了其中的哪一個 script 然後告訴 kernel 要用哪個 syscall allowed list。
所以這個 SE 在一個 interpreter 的 lifetime 裡總共會跑兩次，第一次是 PHP process 跑起來的時候，所有 allowed list 會被 load 到 memory 裡，然後收到 request 的時候，會把要用的 allowed list 給 kernel。
(使用 libseccomp 來把一連串的 syscall 轉成 allowlist)

通常 web request 會是 web server (像是 nginx 或 apache) handle 的，所以他們也有把這個 plugin 裝到 web server 上，然後也有裝到 PHP cli API 中 (就是 command line 的)
因為用了 seccomp 所以一個 process 只能跑一個 script (因為一個 process 不能突然換 allowed list)
- 一個 PHP interpreter process 只能用在一個 request，但 loading 很重的時候這種做法會導致 latency 變很高
- 每個 PHP worker 可以 handle 很多 request 但是同一個 worker 只能 handle 用同一個 script 的 request

Evaluation

dependency 的解析

把所有 dependency 分成 literal 就能解決和需要 dynamic 解決的
Resolved avg -> 74%
Fuzzy resolved -> 22%
Class 85%
像是 Joolma 會把很多原本的 class 改成用 alias 去 call，這種時候就沒辦法抓到

Syscall Profile Size

想知道會 filter 掉多少 syscall 和 CI 的成效，看 syscall profile 的 size，size 愈大就會有愈多可以用的 syscall，所以在能跑的範圍內盡量愈小愈好
特別也 evaluate 了危險的 syscall
黑線是 CI 沒有開的時候
- 開起來的時候 FP 會降低，但危險性會增加
filter 掉大概 80% 的危險 system-call
有些常見的 function 常常在很多的 script 中出現，就會讓很多 script 的 profile 很相似

Defense Capabilities

找了 21 個 RCE exploit 來測試
Is too restrictive?
- 計算 coverage 來證明程式有正常被執行
- 在開了 CI 以後完全沒 FP
- 關了以後
  - Joomla 因為前面提到的 alias 問題
Payload constraint?
- filter 過後的危險 syscall 不足以合成可以 ACE 的 payload，他們也有額外做很多測試，發現剩下的 syscall 都會缺少可以組 payload 的部分
Non-vuln plugins
- 對沒有漏洞的 plugin 也測試來評估 FP
Runtime Overhead
- 重點在 implement 到 web app 以後，因為前面有提到的 seccomp 問題，每次處理完 request process 都要重新跑起來
- 結果是 overhead 小到可以忽略的，即使是特別 trivial 的 script 也沒有影響太大，尤其他們又實作了 “同一個 worker 只能 handle 用同一個 script 的 request” 的方法後，影響又更小了

Limitations and Discussion

eval() system()
- 他們沒有處理這兩種和其中的參數
Mimicry
- 雖然效果已經很好而且在測試中沒有被 RCE，但還是有些危險 syscall 被 allow 而且有機會被拼成 payload
Overwriting scripts
- 如果攻擊者可以竄改 script 讓 profile 被誤導，這個單純用 checksum 可以解決
Writing to sensitive files
- 為了避免被寫檔案拿權限，他們還會用 ptrace 來確保 interpreter 沒辦法碰到一些重要的 fd(file descriptor)
Installing plugins
- 安裝新 plugin 的時候就會馬上幫他建立 profile，但需要手動
Filter system call argument
- 除了直接 filter syscall 以外，如果也對參數 filter 可以更精準
Line coverage og evaluated web apps
Applying Saphire to other interpreters
- Python interpreter running a program composed of multiple python scripts
- A server executing multiple Node.JS microservices
- classic CGI-based interpreted web-app using a language such as Perl or Lua

paper

#paper #security #web

Saphire: Sandboxing PHP Applications with Tailored System Call Allowlists

https://wiiwu959.github.io/2022/02/22/2022-02-22-Saphire/

Author

Wii Wu

Posted on

February 22, 2022

Licensed under

CVE-2021-3129 Laravel debug mode: Remote code execution Previous

HITCON 2018 One Line PHP Challenge Next