eBPF/BCC - A better low-level instrumentation tool on Android

2021/06/30
eBPF
Android
Reversing

# Purpose

eBPF is the new kernel superpower. The program runs in a special VM in the kernel and is JIT compiled to run at the speed of the native code. Yet, it has access to all sorts of Linux tracing facilities e.g. kprobe/uprobe for dynamic instrumentation, and tracepoint/USDT for the static ones. eBPF can reach both the user space and the kernel space which makes it a perfect tool to gain Linux observability. This makes me wonder, what if we use eBPF for Android reversing?

Traditional Android dynamic instrumentation tools, such frida and strace, all run in the user space. As a greater number of apps now implements anti-debugging techniques, frida/strace sometimes won't work until reverse engineers have found ways to disable the anti-debugging code. However, detection of such tools have been proved easy:

  • Frida and strace uses ptrace which may be easily detected by checking the tracer id in the process status
  • ptrace can greatly slow down the target process by 100-fold, making time-based detection easy [source] (opens new window)
  • Frida leaves a heavy footprint which can be detected through its named pipes, frida specific named thread, and by compare text section in memory with text section in disk for both libc and native library [source] (opens new window)

In fact, anti-debugging has been an arms race in the Android user space with a dozen of techniques. Finding out what techniques are used in an app and disabling them one by one can be a tedious yet time-consuming job. Hence, some have been writing kernel modules to implement debugging tools so as to avoid user space detection. But still, the targeted app can quickly find out loaded kernel modules by running lsmod.

eBPF programs, in comparison, have the same level of access as kernel modules* but are much easier to write and maintain. Brendan Gregg has a good summary of their differences in his book BPF Performance Tools.

  • BPF programs are checked via a verifier; kernel modules may introduce bugs (kernel panics) or security vulnerabilities.
  • BPF provides rich data structures via maps.
  • BPF programs can be compiled once and then run anywhere, as the BPF instruction set, map, helpers, and infrastructure are a stable ABI.
  • BPF programs do not require kernel build artifacts to be compiled.
  • BPF programming is easier to learn than the kernel engineering required to develop kernel modules, making it accessible to more people.

*eBPF cannot call arbitrary kernel functions but have to use a set of pre-defined helper functions. eBPF programs can read arbitrary memory (both kernel and user) but can only overwrite arbitrary user memory.[source] (opens new window)

These features make eBPF a great candidate to develop a future generation of Android dynamic instrumentation workflow/tools that can:

  • Be ported across multiple kernel versions (as long as the kernel has the required eBPF feature set, see Challenge 1 below)

  • Access existing tools in bcc/bpftrace

  • Trace system wide activities or app behaviors on multiple processes at the same time

  • Hide from user space which makes eBPF a great tool to detect anti-debugging checks. For instance, apps can't query a list of attached kprobes due to Android's UID-based application sandbox.

    sargo:/ $ whoami
    u0_a135
    sargo:/ $  cat /sys/kernel/debug/kprobes/list
    cat: /sys/kernel/debug/kprobes/list: Permission denied
    

However, writing eBPF programs isn't a trivial task so people have to resort to toolchains such as BCC. BCC allows one to embed c-style eBPF programs in a python script and provides a number of helpers to facilitate eBPF loading and kernel-user space data transfer. In Android, the support of BCC is only added on April 18' which is still relatively new (and only used by Android kernel developers). That might explain why we haven't seen any eBPF-based app instrumentation tools so far.

bcc-timeline

BCC upstream timeline on Android, Source: eBPF super powers on ARM64 and Android (opens new window)

At the moment of writing, setting up a functioning eBPF/BCC toolchain on Android is still a non-trivial task. This write-up aims to show how to use eBPF for system-wide dynamic analysis on Android, understand its advantages and limitations. If you would like to learn eBPF fundamentals, you can start with What is eBPF by Cilium.

# Challenges

eBPF is still under heavy development in recent years. Several technical trivia must be discussed before we jump into the setup.

# 1. eBPF, BCC, and adeb

To work with eBPF, we rely on BCC (opens new window) to compile our code into eBPF bytecode. BCC, however, has many dependencies for compilation such as cmake and LLVM. On an average Linux distribution, such dependencies can be fulfilled by the package manager. Unfortunately, Android doesn't have one. On the other hand, Android uses bionic C library which breaks the building of many libraries that uses GNU extensions. To overcome these limitations, Joel Fernandes at Google developed adeb (opens new window) that creates a chroot-ed Debian environment in Android. adeb essentially does:

  • Build rootfs using qemu-debootstrap
  • Push a prebuilt (or build one) root fs to /data
  • Run adb shell with chroot(2) of /data/.../bash

Source: http://www.joelfernandes.org/resources/adeb-lc18.pdf

...so that we can run any Debian tools on it.

adeb and apex

Android 10+ introduced a new apex (opens new window) mechanism to make low-level system components (e.g. the bionic runtime) upgradable. Since adeb can't mount the special apex file that contains the bionic runtime, uprobe on bionic libc.so (e.g. to intercept pthread_create call) won't work because the bionic libc.so is missing. If you have similar use cases, use Android 9 or below.

The issue has been discussed here https://github.com/joelagnel/adeb/issues/38 (opens new window).

# 2. eBFP vs kernel/AOSP versions

Compatibility always comes first when dealing with AOSP and kernels. Personally I have several needs in mind when picking the kernel version:

  • BCC relies on in-kernel headers (CONFIG_IKHEADERS) to compile eBPF code. Although CONFIG_IKHEADERS was officially added in kernel 5.2+, the patch (opens new window) has been backported to older kernels (e.g. pixel 3a 4.9). To check it, go to kernel/Makefile and see if obj-$(CONFIG_IKHEADERS) += kheaders.o has been defined.
  • eBPF was broken in pixel 3a 4.9 at my time of testing (kprobe was working but no output in BCC) so I need a newer kernel (4.9+).
  • bpftool is a must have to debug eBPF related issues. It was added in 4.15.
  • The kernel should have a long remaining lifecycle so we can save some effort for future portings. From AOSP documentation, 4.19 supports Android 10-13+ which makes itself a good candidate.
  • I want the kernel to be as stable as possible to reduce the chance of occasional broken builds.

I eventually picked up common-android-4.19-stable to base my work with.

android-kernel-lifespa Android kernel lifespan. Source: AOSP documentation (opens new window)

ebpf-kernel-version

BPF features and minimum kernel version requirement, Source: BCC (opens new window)

# 2. Phone vs Emulator

We all know ideally, a reverse engineer should use a real device as much as possible to avoid the hassle of dealing with emulation/VM detections. However, most android devices only have their latest kernel anchored on 4.9 (such as my Pixel 3a). However, BCC was partially broken on 4.9 at my time of testing. Without bpftool (only in 4.15+), debugging BPF issues can be a nightmare.

The only possible option left is to use an emulator. As we are dealing with low-level kernel and android framework interactions, I'd prefer a virtual device that maintains high fidelity when replicating the framework-based behavior of a real device. I chose cuttlefish (opens new window) (crosvm + KVM based) over Android Emulator (QEMU v2 based) for this test.

# Workflow

The setup of my eBPF Android environment is based on a combination of cuttlefish, noVNC, custom kernel, adeb, bcc. More specifically, we need:

  1. kernel: build a custom kernel with desired kernel configs
  2. cuttlefish: build and setup a cuttlefish device on KVM (and enable paravirtualization if the host is a VM)
  3. adeb: install adeb on the cuttlefish device
  4. bcc: use adeb to build and install bcc on the device's Debian environment. Write and run bcc programs.
  5. noVNC: use noVNC to forward cuttlefish's VNC stream to a remote host

# How to setup

# 1. Choose kernel features

eBPF and bcc depend on a number of kernel features that don't come with the stock kernel. Hence, it is important to plan ahead and pick those features that are necessary for your own use cases. adeb/bcc's documentations are good places to start with.

Be careful that none of the guides are self-contained. They all missed some features here and there. Below are the configs that I've collected along the way. The flags are written in android's ${KERNEL_DIR}/scripts/config format so you can directly copy and paste it into your build config files. -d means to delete a flag and -e means to add a flag.

Features Flags             
Default gki_kprobe configs in common-kernel
- Disable kernel security features
- Add debugging support
- Reduce kernel build time
-d CONFIG_LTO \
-d CONFIG_LTO_CLANG \
-d CONFIG_CFI_CLANG \
-d CFI_PERMISSIVE \
-d CFI_CLANG \
-e CONFIG_IRQSOFF_TRACER \
-e CONFIG_PREEMPT_TRACER \
-e CONFIG_DEBUG_FS \
-e CONFIG_CHECKPOINT_RESTORE \
-d CONFIG_RANDOMIZE_BASE \
Enable eBPF support
-e CONFIG_BPF \
-e CONFIG_BPF_SYSCALL \
-e CONFIG_BPF_JIT \
-e CONFIG_HAVE_EBPF_JIT \
-e CONFIG_IKHEADERS \
Enable kprobe
-e CONFIG_HAVE_KPROBES \
-e CONFIG_KPROBES \
-e CONFIG_KPROBE_EVENT \
Enable kretprobe
-e CONFIG_KRETPROBES \
-e CONFIG_HAVE_KRETPROBES \
-d CONFIG_SHADOW_CALL_STACK \
-e CONFIG_ROP_PROTECTION_NONE \
Enable uprobe
-e CONFIG_UPROBES \
-e CONFIG_UPROBE_EVENT \
-e CONFIG_BPF_EVENTS \
[BCC tool] tc filters and tc actions
-e CONFIG_NET_CLS_BPF \
-e CONFIG_NET_ACT_BPF \
[BCC tool] bcc networking examples
-e CONFIG_NET_SCH_SFQ \
-e CONFIG_NET_ACT_POLICE \
-e CONFIG_NET_ACT_GACT \
-e CONFIG_DUMMY \
[BCC tool] critical
-e CONFIG_DEBUG_PREEMPT \
-e CONFIG_PREEMPTIRQ_EVENTS \
-d CONFIG_PROVE_LOCKING \
-d CONFIG_LOCKDEP
[bpftrace] Enable ftrace
-e CONFIG_FTRACE_SYSCALLS \
-e CONFIG_FUNCTION_TRACER \
-e CONFIG_HAVE_DYNAMIC_FTRACE \
-e CONFIG_DYNAMIC_FTRACE \

# 2. Build a custom Generic Kernel Image (GKI)

Download the kernel source code of branch common-android-4.19-stable.

If you're unfamiliar with how to build an Android kernel, read the instruction here first https://source.android.com/setup/build/building-kernels (opens new window).

repo init -u https://android.googlesource.com/kernel/manifest -b common-android11-5.4
repo sync

Since Android 10, Android has introduced a new Generic Kernel Image(GKI) (opens new window) in kernel 4.19 and above. This has significantly changed the build process as it splits the standard kernel into two parts: Google's generic kernel and a vendor-specific kernel. Both kernels have to be built separately.

Unfortunately, our common-android-4.19-stable is a GKI kernel. The GKI kernel separates build configs into two parts:

  • common/build.config.gki_kprobes.x86_64: the general kernel build config which builds vmlinux and bzImage. Here we pick the kprobe variant as we need kprobe for eBPF.
  • common-modules/virtual-device/build.config.cuttlefish_kprobes.x86_64: the vendor kernel build config that builds vendor kernel modules. The build eventually generates an initramfs.img file that contains the .ko binaries. initramfs.img is mounted on /lib/ by Android's first stage init (in userspace)

To customise it, we must carefully ensure that the build flags are exactly the same for both build configs. Any discrepancy in these flags will results in a mismatched memory_layout checksum, making the kernel unable to load the vendor kernel modules.

Let us update both configurations. Open both files below and add the same set of kernel flags in the function update_kprobes_config().

  • common/build.config.gki_kprobes

    (notice this is different from common/build.config.gki_kprobes.x86_64 above)

  • common-modules/virtual-device/build.config.cuttlefish_kprobes.x86_64

$ vi common/build.config.gki_kprobes

DEFCONFIG=gki_defconfig
POST_DEFCONFIG_CMDS="check_defconfig && update_kprobes_config"
function update_kprobes_config() {
    ${KERNEL_DIR}/scripts/config --file ${OUT_DIR}/.config \
       -d CONFIG_LTO \
       -d CONFIG_LTO_CLANG \
       -d CONFIG_CFI_CLANG \
       -d CFI_PERMISSIVE \
       -d CFI_CLANG \
       -e CONFIG_IRQSOFF_TRACER \
       -e CONFIG_PREEMPT_TRACER \
       -e CONFIG_DEBUG_FS \
       -e CONFIG_CHECKPOINT_RESTORE \
       -d CONFIG_RANDOMIZE_BASE \
       -e CONFIG_BPF \
       -e CONFIG_BPF_SYSCALL \
       -e CONFIG_BPF_JIT \
       -e CONFIG_HAVE_EBPF_JIT \
       -e CONFIG_IKHEADERS \
       -e CONFIG_HAVE_KPROBES \
       -e CONFIG_KPROBES \
       -e CONFIG_KPROBE_EVENT \
       -e CONFIG_UPROBES \
       -e CONFIG_UPROBE_EVENT \
       -e CONFIG_BPF_EVENTS \
       -e CONFIG_KRETPROBES \
       -e CONFIG_HAVE_KRETPROBES \
       -d CONFIG_SHADOW_CALL_STACK \
       -e CONFIG_ROP_PROTECTION_NONE \
       -e CONFIG_NET_CLS_BPF \
       -e CONFIG_NET_ACT_BPF \
       -e CONFIG_NET_SCH_SFQ \
       -e CONFIG_NET_ACT_POLICE \
       -e CONFIG_NET_ACT_GACT \
       -e CONFIG_DUMMY \
       -e CONFIG_FTRACE_SYSCALLS \
       -e CONFIG_FUNCTION_TRACER \
       -e CONFIG_HAVE_DYNAMIC_FTRACE \
       -e CONFIG_DYNAMIC_FTRACE \
       -e CONFIG_DEBUG_PREEMPT \
       -e CONFIG_PREEMPTIRQ_EVENTS \
       -d CONFIG_PROVE_LOCKING \
       -d CONFIG_LOCKDEP
    (cd ${OUT_DIR} && \
     make ${CC_LD_ARG} O=${OUT_DIR} olddefconfig)
}

Before building the kernel, remember to increment your EXTRAVERSION in common/Makefile.

VERSION = 4
PATCHLEVEL = 9
SUBLEVEL = 237
EXTRAVERSION =senyuuri-1
NAME = Roaring Lionus

This is to ensure that BCC always picks up the latest kernel headers. Why? Because by default, BCC extracts the kernel header to /tmp. BCC only updates the cached header if the kernel's version string has been changed (see BCC's kbuild_helper.cc below). Incrementing EXTRAVERSION forces BCC to use the latest embedded header so that we can avoid potential compatibility issues.

[email protected]:/tmp# ls -lah
total 11K
drwxrwxrwx.  3 root  root  3.5K May 20 10:56 .
drwxrwxrwx. 21 root  root  3.5K May 20 06:09 ..
drwxr-xr-x.  5 51385 49967 3.5K May 20 08:51 kheaders-4.9.237-dirty_audio
# Excerpt from bcc/src/cc/frontends/clang/kbuild_helper.cc

int get_proc_kheaders(std::string &dirpath)
{
  struct utsname uname_data;
  char dirpath_tmp[256];

  if (uname(&uname_data))
    return -errno;

  snprintf(dirpath_tmp, 256, "/tmp/kheaders-%s", uname_data.release);
  dirpath = std::string(dirpath_tmp);

  if (file_exists(dirpath_tmp))
    return 0;

  // First time so extract it
  return extract_kheaders(dirpath, uname_data);
}

Build the vendor kernel modules followed by GKI. The command should produce a bzImage and initramfs.img in the output folder.

$ BUILD_CONFIG=common-modules/virtual-device/build.config.cuttlefish_kprobes.x86_64
$ BUILD_CONFIG=common/build.config.gki_kprobes.x86_64 build/build.sh

# 3. Setting up cuttlefish

Follow the official instruction (opens new window) to install cuttlefish. If your host is a VM, remember to turn on hardware virtualization support in the VM settings as it's required by KVM.

esxi-settings Enable hardware virtualization in ESXi / vCentre

Once done with the setup, start a cuttlefish VM with our previously built kernel and initramfs. If you encounter a boot loop, check the kernel log at cf/cuttlefish_runtime.1/kernel.log.

# start a cuttlefish VM 
$ HOME=$PWD ./bin/launch_cvd --start_vnc_server=true --kernel_path=/home/senyuuri/aosp_cf_x86_64_img/bzImage-4.19 --initramfs_path=/home/senyuuri/aosp_cf_x86_64_img/initramfs-4.19.img

# cuttlefish's VNC server only listens on 127.0.0.1:6444. If your host machine is different from the machine that runs cuttlefish, use noVNC to forward the VNC
git clone git://github.com/kanaka/noVNC && cd noVNC
./utils/launch.sh --vnc 127.0.0.1:6444
Navigate to this URL:

    http://cuttlefish-kvm:6080/vnc.html?host=cuttlefish-kvm&port=6080

novnc Access remote cuttlefish device through noVNC

# 4. Setup adeb

By now you should be able to see the cuttlefish device in adb devices. Now download and install adeb. Notice adeb defaults to arm64 build. As our cuttlefish is an x64 device, we have to specify use the --build --arch amd64 --bcc flag.

$ git clone https://github.com/joelagnel/adeb.git
$ cd adeb
$ sudo ln -s $(pwd)/adeb /usr/bin/adeb
$ export ADEB_REPO_URL="github.com/joelagnel/adeb/"

$ sudo apt-get install qemu-user-static debootstrap
# target arch other than arm64 must pass the --build option
$ adeb prepare --build --arch amd64 --bcc

Once done, try adeb shell into the device and see if any bcc tools (e.g. tcpconnect) works.

$./adeb shell
[email protected]:/# tcpconnect
PID    COMM         IP SADDR            DADDR            DPORT
5853   specialhttpd 6  ::ffff:192.168.1.195 ::ffff:182.254.116.117 80
5853   beacon-threa 6  ::ffff:192.168.1.195 ::ffff:203.205.235.218 8081
5853   TDM-report-1 4  192.168.1.195    210.22.247.194   3013
5853   MSDKV3-Http- 4  192.168.1.195    203.205.254.177  443
5853   UnityGfx     4  192.168.1.195    14.18.202.32     20166
5853   beacon-threa 6  ::ffff:192.168.1.195 ::ffff:203.205.235.218 8081
5645   spdy-0       4  192.168.1.195    203.119.205.113  443
5645   NetWorkSende 6  ::ffff:192.168.1.195 ::ffff:203.119.214.125 443
5645   UnityMain    4  192.168.1.195    47.95.163.11     443
5853   TDM-report-1 4  192.168.1.195    210.22.247.194   3013
5853   beacon-threa 6  ::ffff:192.168.1.195 ::ffff:203.205.235.218 8081
5853   TDM-report-1 4  192.168.1.195    58.247.215.105   3013

# eBPF/BCC in Action

As we mentioned earlier, eBPF on Android is in its early days and there aren't any generic tools made for mobile reversing yet. Luckily enough, most BCC tools still work. You can start exploring /usr/share/bcc/tools and see if any of the swiss knife fits your need. Some examples are:

# trace the execve syscall
[email protected]:/usr/share/bcc# execsnoop
ls               6756   2084     0 /system/bin/ls /data/androdeb/debian/.bashrc
sh               6758   2084     0 /system/bin/sh -c /data/androdeb/run
run              6758   2084     0 /data/androdeb/run
dirname          6761   6760     0 /system/bin/dirname /data/androdeb/run
mount            6762   6758     0 /system/bin/mount
grep             6763   6758     0 /system/bin/grep debian
chroot           6764   6758     0 /system/bin/chroot debian/ /bin/bash --rcfile .bashrc
bash             6764   6758     0 /bin/bash --rcfile .bashrc

# trace TCP connections (kprobe tcp_v4_connect/tcp_v6_connect/udp_recvmsg)
^[email protected]:/# tcpconnect
Tracing connect ... Hit Ctrl-C to end
PID    COMM         IP SADDR            DADDR            DPORT
8861   AsyncTask #1 4  192.168.97.2     157.240.217.17   443
8861   AsyncTask #2 4  192.168.97.2     157.240.217.17   443
8861   AsyncTask #3 4  192.168.97.2     157.240.217.17   443
8861   pool-16-thre 4  192.168.97.2     52.84.229.77     443
8861   pool-16-thre 4  192.168.97.2     52.84.229.29     443

# trace open/openat/openat2 syscalls
[email protected]:/# opensnoop
8849   cut                11   0 /system/lib64/libc++.so
8849   cut                12   0 /system/lib64/libbase.so
8849   cut                13   0 /system/lib64/libcgrouprc.so
8849   cut                14   0 /system/lib64/libpcre2.so
8849   cut                15   0 /system/lib64/libpackagelistparser.so
1606   ndroid.systemui    -1   2 /dev/pmsg0
1606   ndroid.systemui    -1   2 /dev/pmsg0
350    android.hardwar    -1   2 /dev/pmsg0
350    android.hardwar    -1   2 /dev/pmsg0
172    logd.reader.per    23   0 /proc/7013/comm
172    logd.reader.per    23   0 /proc/7013/cmdline

eBPF can also trace functions calls in user space libraries. But notice that you will need to explicitly attach to a uprobe on the bionic libc. The exact libc.so path may vary across different Android versions.

[email protected]:/# bpftrace -e 'BEGIN { printf("%-10s %-6s %-16s %s\n", "TIME(ms)", "PID", "COMM", "FUNC");} uprobe:/system/apex/com.android.runtime.release/lib64/bionic/libc.so:pthread_create{ printf("%-10u %-6d %-16s %s\n", elapsed /1000000, pid, comm, usym(arg2));}'
Attaching 2 probes...
TIME(ms)   PID    COMM             FUNC
4307       732    [email protected]  __timer_thread_start(void*)
4354       11363  .android.camera  art::Thread::CreateCallback(void*)
4364       11363  RxCachedThreadS  art::Thread::CreateCallback(void*)
4390       11363  .android.camera  art::Thread::CreateCallback(void*)
4406       732    [email protected]  __timer_thread_start(void*)
4578       5250   pool-15-thread-  art::Thread::CreateCallback(void*)
4777       1237   backlight-notif  0x75090b29a8
4811       701    HwBinder:701_5   0x7d510079c0
4842       701    HwBinder:701_5   0x7d5102c228
4856       602    netd             0x7644cd39a0
4859       701    HwBinder:701_5   0x7d5102c228
4897       701    HwBinder:701_5   0x7d510a0628
4996       701    HwBinder:701_5   0x7d773a89a8

# Future Work

There are many areas that can be improved to make eBPF a more useful tool for dynamic Android instrumentation.

  • Better tooling: create a higher-level framework (like Frida) with common eBPF utilities e.g. bionic libc tracing, file hiding, and memory read/write gadgets
  • Rationalise the result: study how app-level anti-debugging and obfuscation processes reflect on the BCC output, and how to apply that knowledge to highlight key events based on eBPF observations
  • Easier setup: create a container-based virtual device farm that automatically provisions the eBPF toolchain

# Troubleshooting

# 1. General troubleshooting tips

  • Check if the issue had been mentioned in adeb BCC guide (opens new window)

  • In the adeb environment, use zgrep to see if the required build configs are set as intended in the kernel header

    [email protected]:/$ zgrep CONFIG_BPF /proc/config.gz
    CONFIG_BPF=y
    CONFIG_BPF_SYSCALL=y
    # CONFIG_BPF_JIT_ALWAYS_ON is not set
    CONFIG_BPF_JIT=y
    CONFIG_BPF_EVENTS=y
    
  • Try isolating the problem and see if it happens in kprobe, bcc, or eBPF. For example, use cat /sys/kernel/debug/kprobes/list to see if a BCC program has successfully attached to a kprobe.

  • For generic eBPF issues, check out Brendan Gregg's book BPF Performance Tools - Part III: Additional Topics - 18 Tips, Tricks, and Common Problems and this FOSDEM’20 sharing - Tools and Mechanisms to Debug BPF Programs (opens new window)

# 2. '../../arm/include/asm/opcodes.h' file not found

[email protected]:/# opensnoop
In file included from /virtual/main.c:2:
In file included from include/uapi/linux/ptrace.h:84:
In file included from ./arch/arm64/include/asm/ptrace.h:58:
In file included from include/linux/bug.h:4:
In file included from ./arch/arm64/include/asm/bug.h:52:
In file included from include/asm-generic/bug.h:13:
In file included from include/linux/kernel.h:13:
In file included from include/linux/printk.h:8:
In file included from include/linux/cache.h:5:
In file included from ./arch/arm64/include/asm/cache.h:5:
In file included from ./arch/arm64/include/asm/cachetype.h:5:
In file included from ./arch/arm64/include/asm/cputype.h:104:
In file included from ./arch/arm64/include/asm/sysreg.h:8:
./arch/arm64/include/asm/opcodes.h:5:10: fatal error: '../../arm/include/asm/opcodes.h' file not found
#include <../../arm/include/asm/opcodes.h>
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Traceback (most recent call last):
  File "/usr/share/bcc/tools/opensnoop", line 180, in <module>
    b = BPF(text=bpf_text)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 343, in __init__
    raise Exception("Failed to compile BPF module %s" % (src_file or "<text>"))
Exception: Failed to compile BPF module <text>

This is an arm64 kernel bug that happens at the build time when in-kernel header (CONFIG_IKHEADERS) is enabled. The discussion can be found on https://www.spinics.net/lists/arm-kernel/msg643934.html (opens new window).

The bug affects all header files that used a relative import of #include <../../arm. To fix it, cherry-pick the patch (opens new window). The patch, however, missed out the file arch/arm64/include/asm/opcodes.h. You can manually fix this file by replacing its content with arch/arm/include/asm/opcodes.h.


Header files that contain #include <../../arm

# 3. Disagrees about version of symbol module_layout

[    0.993040] init: init first stage started!
[    0.994170] init: Loading module /lib/modules/5.4.86-android11-2-00002-g0a5d2ddf2353-dirty/kernel/fs/incfs/incrementalfs.ko with args ""
[    0.997083] incrementalfs: disagrees about version of symbol module_layout
[    0.998519] init: Failed to insmod '/lib/modules/5.4.86-android11-2-00002-g0a5d2ddf2353-dirty/kernel/fs/incfs/incrementalfs.ko' with args ''
[    1.001197] init: LoadWithAliases was unable to load incrementalfs
[    1.002570] init: Switching root to '/first_stage_ramdisk'
[    1.003808] init: [libfs_mgr]ReadFstabFromDt(): failed to read fstab from dt
[    1.005487] init: Using Android DT directory /proc/device-tree/firmware/android/
[    1.017222] init: bool android::init::BlockDevInitializer::InitDevices(std::set<std::string>): partition(s) not found in /sys, waiting for their uevent(s): metadata, super, vbmeta_a, vbmeta_system_a

This is a modversions issue that occurs at cuttlefish boot time. It is a hint that the GKI and vendor kernel modules are built with different build configs. You can manually confirm the issue by comparing the memory_layout checksum between the affected kernel module and the kernel (use vmlinux).

modinfo out/android11-5.4/dist/incrementalfs.ko
filename:       /home/senyuuri/common-kernel-5.4/out/android11-5.4/dist/incrementalfs.ko
license:        GPL v2
import_ns:      VFS_internal_I_am_really_a_filesystem_and_am_NOT_a_driver
author:         Eugene Zemtsov <[email protected]>
description:    Incremental File System
vermagic:       5.4.86-android11-2-00002-g0a5d2ddf2353-dirty SMP preempt mod_unload modversions
name:           incrementalfs
intree:         Y
retpoline:      Y
depends:

~/common-kernel-5.4$ modprobe --dump-modversions out/android11-5.4/dist/incrementalfs.ko |grep module
0xb353a754	module_layout

~/common-kernel-5.4$ nm out/android11-5.4/dist/vmlinux | grep __crc_module_layout
00000000b353a754 A __crc_module_layout

To fix it, follow the guide in How to setup - 2. Build a custom Generic Kernel Image (GKI) and make sure two build configs have the same set of feature flags.

# 4. Unknown symbol GLOBAL_OFFSET_TABLE (err -2)

[    1.070725] Run /init as init process
[    1.079584] init: init first stage started!
[    1.080905] init: Loading module /lib/modules/5.4.86-android11-2-00002-g0a5d2ddf2353-dirty/kernel/fs/incfs/incrementalfs.ko with args ""
[    1.084828] incrementalfs: Unknown symbol _GLOBAL_OFFSET_TABLE_ (err -2)
[    1.105500] init: Failed to insmod '/lib/modules/5.4.86-android11-2-00002-g0a5d2ddf2353-dirty/kernel/fs/incfs/incrementalfs.ko' with args ''
[    1.115919] init: LoadWithAliases was unable to load incrementalfs

[email protected]:~/aosp_cf_x86_64_img$ nm -an incrementalfs.ko |grep -i global
                 U _GLOBAL_OFFSET_TABLE_

The issue only happens when someone try to use common-modules/virtual-device/build.config.cuttlefish_kprobes.x86_64 to build the GKI (by adding additional output files such ad vmlinux and bzImage in the FILES variable). To fix it, build the vendor modules and GKI separately as instructed previously.

# 5. Failed to attach BPF program xxx to kretprobe yyy

[email protected]:/# execsnoop
cannot attach kprobe, Function not implemented
Traceback (most recent call last):
  File "/usr/share/bcc/tools/execsnoop", line 173, in <module>
    b.attach_kretprobe(event=execve_fnname, fn_name="do_ret_sys_execve")
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 680, in attach_kretprobe
    (fn_name, event))
Exception: Failed to attach BPF program do_ret_sys_execve to kretprobe sys_execve

The issue happens when CONFIG_KRETPROBES is not configured at kernel build time. Notice that the flag depends on three other flags. The final build will not have CONFIG_KRETPROBES if any of the dependent flags are not set.

config KRETPROBES
	def_bool y
	depends on KPROBES && HAVE_KRETPROBES && ROP_PROTECTION_NONE

source/arch/kconfig

A less noticed problem is that ROP_PROTECTION_NONE is part of a config choice group that contains SHADOW_CALL_STACK, which is turned on by default in newer Android kernels. CONFIG_ROP_PROTECTION_NONE will be removed if CONFIG_SHADOW_CALL_STACK is present.

choice
	prompt "Return-oriented programming (ROP) protection"
	default ROP_PROTECTION_NONE
	help
	  This option controls kernel protections against return-oriented
	  programming (ROP) attacks, which involve overwriting function return
	  addresses.

config ROP_PROTECTION_NONE
	bool "None"

config SHADOW_CALL_STACK
	bool "clang Shadow Call Stack (EXPERIMENTAL)"
	depends on ARCH_SUPPORTS_SHADOW_CALL_STACK
	help
	  This option enables clang's Shadow Call Stack, which uses a shadow
	  stack to protect function return addresses from being overwritten by
	  an attacker. More information can be found from clang's
	  documentation:

	    https://clang.llvm.org/docs/ShadowCallStack.html

endchoice

source/arch/kconfig

To fix it, make sure to explicitly disable CONFIG_SHADOW_CALL_STACK in the build config.

 -e CONFIG_KRETPROBES \
 -e CONFIG_HAVE_KRETPROBES \
 -d CONFIG_SHADOW_CALL_STACK \
 -e CONFIG_ROP_PROTECTION_NONE \

# References

http://www.joelfernandes.org/resources/bcc-ospm.pdf (opens new window)

https://chromium.googlesource.com/chromiumos/docs/+/refs/heads/stabilize-12331.B/kernel_faq.md#How-do-I-backport-an-upstream-patch (opens new window)

https://www.redhat.com/en/blog/introduction-virtio-networking-and-vhost-net (opens new window)

https://github.com/torvalds/linux/commit/d021c344051af91f42c5ba9fdedc176740cbd238 (opens new window)

https://terenceli.github.io/技术/2020/04/18/vsock-internals (opens new window)

https://www.aisp.sg/cyberfest/document/CRESTConSpeaker/eBPF.pdf (opens new window)

https://elinux.org/images/d/dc/Kernel-Analysis-Using-eBPF-Daniel-Thompson-Linaro.pdf (opens new window)

https://ci.android.com/builds/branches/aosp_kernel-common-android11-5.4/grid?head=7460818&tail=7460818 (opens new window)

https://stackoverflow.com/questions/65415511/android-kernel-build-flow-with-gki-introduced-from-android-11 (opens new window)

https://8mantech.thinkific.com/courses/take/android-internals/texts/19755940-android-make-build-system (opens new window)