Linux md RAID 10 disk layout

I'm working on re-doing my home router / VM server to provide better IO. The goal is to have an SSD as the boot drive, and 4 2TB disks in RAID 10 (4TB usable total) for the VMs. I'll be using md RAID for building it, however I want to be particular about where the drives are physically, for a few reasons:

  • The RAID drives are all SATA 6Gbps drives, but the server's motherboard only has 2 SATA 6Gbps ports. I've got a 2-port PCIe SATA 6Gbps controller on the way[0].
  • I want to place the two drives in each base RAID 1 set on different controllers: one on the motherboard controller, one on the PCIe card controller. This may provide extra performance, but more importantly, having each disk on a different controller protects the array as a whole in case of a controller failure.

Linux md RAID 10 is a relatively new mode. In the past, you would manually create multiple RAID 1 arrays, then combine them in RAID 0, thereby knowing where the disks were placed, since you were doing it yourself. The mdadm RAID 10 method is easier, but I found there is literally no documentation on what drives it uses for the underlying RAID 1 arrays. Using loopback devices and some trial and error, I figured out how the arrays are assembled.

In a nutshell, the underlying RAID 1 arrays are paired two at a time, in order given during creation. If you were to do this:

# mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/sd{a,b,c,d}1

sda1 and sdb1 form one RAID 1 array, and sdc1 and sdd1 form another:

|           RAID0           |
|    RAID1    |    RAID1    |
| sda1 | sdb1 | sdc1 | sdd1 |

One other thing to mention is what happens when multiple drives are lost. Say in this case, both sda1 and sdd1 are lost, resulting in a degraded but functional array:

# mdadm --fail /dev/md0 /dev/sda1
# mdadm --remove /dev/md0 /dev/sda1
# mdadm --fail /dev/md0 /dev/sdd1
# mdadm --remove /dev/md0 /dev/sdd1
|           RAID0           |
|  RAID1 (D)  |  RAID1 (D)  |
|      | sdb1 | sdc1 |      |

If you were to replace sdd and add it back first, you might think it would go in the second RAID 1 array. But no, it takes the first available degraded slot:

# mdadm --add /dev/md0 /dev/sdd1
|           RAID0           |
|    RAID1    |  RAID1 (D)  |
| sdd1 | sdb1 | sdc1 |      |

So be careful in this situation, if you care about where the devices are physically laid out.

[0] Half of the order actually arrived Friday, including the SSD and a 4-port PCIe 4x SATA 6Gbps controller. The idea was to place two of the RAID drives on the motherboard SATA 6Gbps controller, and two on the new controller, plus the boot SSD (which is also SATA 6Gbps). My system has 3 PCIe 1x ports and a PCIe 16x port. The 4x card was supposed to go in the 16x port, but I learned after failure that many motherboards do not like non-video cards in the primary 16x port. Oh well. The boot SSD will now go on one of the motherboard SATA 3Gbps ports, and I've got an order in for a 2-port PCIe 1x SATA 6Gbps controller.

2 thoughts on “Linux md RAID 10 disk layout”

  1. Hi,

    Thanks for your great article, i suspected as much in the way it would work, but i would really like to know how you figured out which disk is in which mirror?

    you mention "Using loopback devices and some trial and error" by any chance could you describe this process further? I would like to trial this myself.

    All i could think of was doing a "dd" of some sectors and then doing a "diff" between them?

    1. No, much simpler than that. I would create 4 loop devices:

      dd if=/dev/zero of=file0 bs=1M count=10
      losetup /dev/loop0 file0
      dd if=/dev/zero of=file1 bs=1M count=10
      losetup /dev/loop1 file1
      dd if=/dev/zero of=file2 bs=1M count=10
      losetup /dev/loop2 file2
      dd if=/dev/zero of=file3 bs=1M count=10
      losetup /dev/loop3 file3

      Assemble them into a RAID10 array:

      mdadm --create --verbose /dev/md0 --level=raid10 --raid-devices=4 /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3

      Then fail the devices one at a time until /proc/mdstat showed a broken array. For example, this would break the array because both loop0 and loop1 were part of the same RAID1 internal set:

      mdadm --fail /dev/md0 /dev/loop0
      mdadm --fail /dev/md0 /dev/loop1

      But this would not break the array:

      mdadm --fail /dev/md0 /dev/loop0
      mdadm --fail /dev/md0 /dev/loop2

      (Don't follow this verbatim. I'm typing this from memory, and md0 or loop0 may already be in use on the system, etc, etc.)

Leave a Reply

Your email address will not be published. Required fields are marked *