# Introduction

This is a writeup of the Raiders of Corruption challenge of the GoogleCTF 2021

# Challenge Description

Picked up these at a yardsale, there doesn’t seem to be anything useful in there though!

Attachment: 10 images files named diskXX.img for x in 1..10

# Solution

Initial inspection showed that the files contain a Linux software RAID 5:

$file *.img disk01.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk02.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk03.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk04.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk05.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk06.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk07.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk08.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk09.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10 disk10.img: Linux Software RAID version 1.2 (1) UUID=ad89154a:f0c39ce3:99c46240:21b5e681 name=0 level=5 disks=10  After trying to mount the RAID, we got different errors depending on the Linux version of different team members, but it was clear that the RAID was corrupted in some way. We did not get any errors saying that the array was currupted though, and it was possible to recover some data form the files by using tools like binwalk, but it was clearly corrupted. To dig a little deeper we used a small python script to create the XOR between all disks, which should give us all zeros on a functioning RAID 5 array: disks = [None for _ in range(10)] for i in range(10): with open(f'disk{i+1:02d}.img', 'rb') as f: disks[i] = f.read() with open('xor.img', 'wb') as f: out = bytearray() for b in range(len(disks[0])): byte = 0 for i in range(len(disks)): byte ^= disks[i][b] out.append(byte) f.write(out)  The resulting file can be examined with hexyl: $ hexyl --border ascii xor.img
+--------+-------------------------+-------------------------+--------+--------+
|00000000| 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |00000000|00000000|
|*       |                         |                         |        |        |
|000010a0| 00 00 00 00 00 00 00 00 | a6 ba ae eb 3e d7 00 92 |00000000|××××>×0×|
|000010b0| 80 4a c9 6f ea e5 ae 89 | 00 00 00 00 00 00 00 00 |×J×o××××|00000000|
|000010c0| 00 00 00 00 00 00 00 00 | 00 00 00 00 00 00 00 00 |00000000|00000000|
|*       |                         |                         |        |        |
|00500000|                         |                         |        |        |
+--------+-------------------------+-------------------------+--------+--------+


This is the only region that does not withstand a parity check, and since it is in the header region of the drives, this is expected. The drives actually do contain a valid array, so something must be wrong with the headers.

We found a great source for an explanation of the different fields in a Linux RAID here https://raid.wiki.kernel.org/index.php/RAID_superblock_formats, which explains that in version 1.2 of Linux RAID, the superbloc is always 4K from the start of every partition, so 0x1000 is the start of our superblocks.

Our differences are from 0xA8 - 0xB7 in the superblock, which are the device_uuid or “UUID of the component device”. This makes sense as all devices in a RAID 5 contain different data, so there needs to be a way to destinguish them.

However offset 0xA0 in the superblock contains the dev_number or “Permanent identifier of this device” which should indicate where in the RAID array the device is used. But it turns out this field is the same for all disks (09 00 00 00 in little endian). So there is actually no way for the kernel to figure out which disks belongs into which role/position in the array.

Now our task got a bit clearer: Find out what is the correct order of the disks. We could have used the checksum of the headers, but the challenge authors took care of them as well and replaced them all with 0xbadc0de:

\$ mdadm -E disk05.img
<...>
Checksum : badc0de - expected ff67acc5


So we had to get a bit more creative. We were able to determine that disk01 and disk10 were at the correct position already, because disk01 was the only one that contained the start of an ext4 file system, and disk10 was the only one that contained seemingly arbitrary data, making it the only candidate for the parity slice. For an overview of how a left asymmetric RAID 5 looks, see here: http://www.reclaime-pro.com/posters/raid-layouts.pdf

We realized that the disk was filled with different plays from Shakespeare, apparently saved as text files (note it later turned out to be just one large file containing all plays). With this information it was easy to figure out which disk was followed by another simply by looking for text fragments that overlapped slice boundaries.

Within a few minutes we had all the text fragments matched and deduced the correct oder of the disks:

0 -> 6 -> 3 -> 5 -> 2 -> 4 -> 1 -> 7 -> 8 -> 9


since at this point we already implemented a simple RAID recovery tool, we used it to extract the ext partition from the RAID, which was then just mounted to extract the flag, alongside with 484 pirate flags in .jpg files.

sudo mount unraid.img /mnt/unraid


# Files

## sovle.py

import os

disks = [None for _ in range(10)]
for i in range(10):
with open(f'disk{i+1:02d}.img', 'rb') as f:

searches = [
b"Than these poor compounds that thou mayst not sell.",
b"I sell thee poison; thou hast sold me none.",

b"Live, and be prosperous; and farewell, g",
b"Bal. [aside] For all this same, I'll hide me hereabout.",

b"Friar. Saint Francis be my speed! how oft to-night",
b"Have my old feet stumbled at graves! Who's there?",
b"Bal. Here's one, a friend, and one that knows you well.",

b"Wife. The people in the street cry 'Romeo,'",
b"Some 'Juliet,' and some 'Paris'; and all run,",
b"With open outcry, toward our monument.",

b"If I departed not and left him there.",
b"Prince. Give me the letter. I will look on it.",
b"Where is the County's page that rais'd the watch?",

b"Why, Belman is as good as he, my",
b"upon it at the merest loss",
b"And twice today",

b"And give them friendly welcome every",
b"Let them want nothing that my house affords.",

b"Or wilt thou ride? Thy horses shall be trapp'd,",
b"Their harness studded all with gold and pearl.",
b"Dost thou love hawking? Thou hast hawks will soar",

b"Ay, it stands so that I may hardly tarry so long. But I",
b"be loath to fall into my dreams again: I will therefore tarry in",
b"despite of the flesh and the blood.",

b"If either of you both love Katherina,",
b"Because I know you well and love you well,",
b"Leave shall you have to court her at your pleasure.",

b"shall be so far forth friendly maintained, till by helping",
b"Baptista's eldest daughter to a husband, we set his youngest free",
b"for a husband, and then have to't afresh. Sweet Bianca! Happy man",
]

disks_data = [x[0x100000:] for x in disks]

disks_data_slices = []

for i in range(10):
diskslices = []
for slicenum in range(len(disks_data[0])//0x1000):
diskslices.append(disks_data[i][0x1000*slicenum:0x1000*(slicenum+1)])
disks_data_slices.append(diskslices)

for si, search in enumerate(searches):
for didx, diskslices in enumerate(disks_data_slices):
for slicenum, pslice in enumerate(diskslices):
if search in pslice:
print(f'{didx} ', end='')

# order: 0 6 3 5 2 4 1 7 8 9

disks_data = [
disks_data[0],
disks_data[6],
disks_data[3],
disks_data[5],
disks_data[2],
disks_data[4],
disks_data[1],
disks_data[7],
disks_data[8],
disks_data[9],
]

with open('unraid.img', 'wb') as f:
for slicenum in range(len(disks_data[0])//0x1000):
print(f'slice {slicenum}: ', end='')
for i in range(10):
parity_in_slice = 10 - (slicenum % 10) - 1
slloc = slicenum
if i >= parity_in_slice:
slloc += 1
if parity_in_slice == 0:
continue
print(slloc, ",", end='')
f.write(disks_data[i][0x1000*slloc:0x1000*(slloc+1)])
print()