Linux Device Nodes and Online Array Configuration

Linux (and UNIX) were originally designed with an assumption that physical disk devices do not change very often. When such changes do occur, operator intervention, manual changes to configuration files, perhaps a kernel rebuild, and certainly a reboot would be required.

Things are getting better, and for example it is now possible with Linux to dynamically add and remove SCSI disks to the system without rebooting. It is likewise possible to reconfigure a Compaq SmartArray controller to dynamically add and remove logical volumes without the need for a reboot.

As is the case with adding and removing SCSI disks, there are some pitfalls to avoid when adding and removing logical volumes with a Compaq SmartArray controller. Particularly, the mapping between devices nodes, such as "/dev/cciss/c0d0p2" and logical volumes may change. If that happens, corresponding changes must be made to /etc/fstab, database configuration files, etc., so that for example, /usr is not accidentally mounted where /home is meant to be mounted.

This document will explain how Linux arrives at the mapping between device nodes and logical volumes, so that you may anticipate what other changes will be required after reconfiguring your array controller.

Each Compaq SmartArray controller in the system initially presents a number of logical volumes to the operating system in sequence, with no gaps. So if you had two array controllers each with 5 logical volumes, they would be two sets of logical volumes, each numbered 0, 1, 2, 3, 4.

The linux device nodes associated with those devices would be:

volume number	Controller 0		Controller 1
-------------------------------------------------------
	0	/dev/cciss/c0d0		/dev/cciss/c1d0
	1	/dev/cciss/c0d1		/dev/cciss/c1d1
	2	/dev/cciss/c0d2		/dev/cciss/c1d2
	3	/dev/cciss/c0d3		/dev/cciss/c1d3
	4	/dev/cciss/c0d4		/dev/cciss/c1d4
Pretty simple so far.

The problem comes about when you reconfigure an array controller. Let's say we reconfigure controller 0 and remove logical volume 2 on our running system. Now we have this (ignoring controller 1).

volume number	Controller 0		How it's being used
-----------------------------------------------------------------------------
	0	/dev/cciss/c0d0	 	(/dev/cciss/c0d0p1 mounted on /)
	1	/dev/cciss/c0d1		(/dev/cciss/c0d1p1 mounted on /usr)
(2, not used)
	3	/dev/cciss/c0d3		(/dev/cciss/c0d3p1 mounted on /home)
	4	/dev/cciss/c0d4		(/dev/cciss/c0d4p1 mounted on /data1)
And let's suppose our /etc/fstab file looks like this:
/dev/cciss/c0d0p5       /                       ext2    defaults        1 1
/dev/cciss/c0d0p1       /boot                   ext2    defaults        1 2
/dev/fd0                /mnt/floppy             auto    noauto,owner    0 0
none                    /proc                   proc    defaults        0 0
none                    /dev/pts                devpts  gid=5,mode=620  0 0
/dev/cciss/c0d0p6       swap                    swap    defaults        0 0
/dev/cciss/c0d1p1       /usr                    ext2    defaults        1 2
/dev/cciss/c0d3p1       /home                   ext2    defaults        1 2
/dev/cciss/c0d4p1       /data1                  ext2    defaults        1 2
So what's the problem? The problem is that on the next reboot, the logical volumes will be presented as 0, 1, 2, 3 instead of 0, 1, 3, 4. The "hole" between 1 and 3 will close up. That means that after a reboot, "/dev/cciss/c0d3p1" would refer to what was on "/dev/cciss/c0d4p1" and "/dev/cciss/c0d4p1" will refer to a non-existent logical volume, and "/dev/cciss/c0d2p1" will refer to what was on "/dev/cciss/c0d3p1".

If you rebooted at this point, /home would contain the filesystem currently mounted on /data1, and /data1 would not have anything mounted on it.

So, you must edit your /etc/fstab to reflect the changes to the array controller configuration. The last two lines must be changed to use "c0d2p1" and "c0d3p1" instead of "c0d3p1" and "c0d4p1".

/dev/cciss/c0d0p5       /                       ext2    defaults        1 1
/dev/cciss/c0d0p1       /boot                   ext2    defaults        1 2
/dev/fd0                /mnt/floppy             auto    noauto,owner    0 0
none                    /proc                   proc    defaults        0 0
none                    /dev/pts                devpts  gid=5,mode=620  0 0
/dev/cciss/c0d0p6       swap                    swap    defaults        0 0
/dev/cciss/c0d1p1       /usr                    ext2    defaults        1 2
/dev/cciss/c0d2p1       /home                   ext2    defaults        1 2
/dev/cciss/c0d3p1       /data1                  ext2    defaults        1 2
So the potential for problems to arise can come about from deleting logical volumes, which can leave "holes" in the sequence of logical volumes. Those "holes" will close up when the system is rebooted, which means the mapping of device nodes to logical volumes will change.

Additionally, if you first delete a logical volume, then create a new one, the new one will be assigned the first available volume number. In essence, it will fill in the first hole left by deleting a logical volume. You need to be aware of this to be able to accurately predict what device nodes will be matched with what logical volumes.

For example, suppose we start with three logical volumes, 0, 1, 2, mapping to /dev/cciss/c0d0, /dev/cciss/c0d1, and /dev/cciss/c0d2, respectively, like so:

	Volume number		device node
-----------------------------------------------
		0		/dev/cciss/c0d0
		1		/dev/cciss/c0d1
		2		/dev/cciss/c0d2
If we delete logical volume 1, (c0d1) and then, without rebooting, create a new logical volume, the new logical volume will correspond to /dev/cciss/c0d1.

One more situation needs to be considered. Suppose you have a SmartArray controller connected to some SCSI disks, say a Compaq RA4214 for example, and also to an external storage box with it's own array controller, say a Compaq Smart Array Cluster Storage. In this case, the collection of logical volumes on the two storage boxes will appear to to the OS to be directly attached to the one SmartArray installed in the server, like so:

logical volume		physically resides	device node
-----------------------------------------------------------
	0		in RA4214		c0d0
	1		in RA4214		c0d1
	2		in RA4214		c0d2
	3		in Voyager CL		c0d3
	4		in Voyager CL 		c0d4
Now suppose you add some more disks to the RA4214, and create one new logical volume using those disks. What will happen? Initially you will get the following:
logical volume		physically resides	device node
-----------------------------------------------------------
	0		in RA4214		c0d0
	1		in RA4214		c0d1
	2		in RA4214		c0d2
	3		in Voyager CL		c0d3
	4		in Voyager CL 		c0d4
	5		in RA4214 		c0d5
But, after rebooting, you will have:
logical volume		physically resides	device node
-----------------------------------------------------------
	0		in RA4214		c0d0
	1		in RA4214		c0d1
	2		in RA4214		c0d2
	3		in RA4214 		c0d3
	4		in Voyager CL		c0d4
	5		in Voyager CL 		c0d5
The array controller will present logical volumes in SCSI direct attached storage first, followed by logical volumes in an attached Voyager CL.

So now you can see it's fairly easy to predict the mapping, once you know how it works. However, even with the ability to predict the mapping, there is still the potential for problems with online reconfiguration. Suppose you reconfigure your array controller, make the appropriate changes to /etc/fstab, and any other necessary files in anticipation of the next reboot. Until that reboot occurs, your /etc/fstab is not in sync with the running system, so manually run "mount" commands that rely on /etc/fstab reflecting a reasonable mapping between device nodes and filesystems could run into problems. A second system administrator, unaware of the changes you have made to the system and to /etc/fstab could be misled by reading /etc/fstab, and consequently do something disastrous. The "dump" command or other backup utility software could be misled, potentially ruining backups. So you may want to make your changes to a copy of /etc/fstab which you only move into place just prior to rebooting.

In short, the ordinary prudence of a competent system administrator is required when when making changes to a system's disk configuration and critical system files such as /etc/fstab.