|
The mhd ioctl(2) control access rights of a multihost
disk, using disk reservations on the disk device.
The stability level of this interface (see attributes(5)) is evolving. As a result,
the interface is subject to change and you should limit your use of it.
The mhd ioctls fall into two major categories:
- ioctls for non-shared multihost disks
-
- ioctls for shared multihost disks.
-
One ioctl, MHIOCENFAILFAST, is applicable
to both non-shared and shared multihost disks. It is described after the
first two categories.
All the ioctls require root privilege.
For all of the ioctls, the caller should obtain the file descriptor
for the device by calling open(2)
with the O_NDELAY flag; without
the O_NDELAY flag, the open may
fail due to another host already having a conflicting reservation on the device.
Some of the ioctls below permit the caller to forcibly clear a conflicting
reservation held by another host, however, in order to call the ioctl, the
caller must first obtain the open file descriptor.
Non-shared multihost disks
|
Non-shared multihost disks ioctls consist of MHIOCTKOWN, MHIOCRELEASE, HIOCSTATUS,
and MHIOCQRESERVE. These ioctl requests control the access
rights of non-shared multihost disks. A non-shared multihost disk is one that
supports serialized, mutually exclusive I/O mastery by the connected hosts.
This is in contrast to the shared-disk model, in which concurrent access is
allowed from more than one host (see below).
A non-shared multihost disk can be in one of two states:
- Exclusive access state, where only one connected host has
I/O access
- Non-exclusive access state, where all connected hosts have
I/O access. An external hardware reset can cause the disk to enter the non-exclusive
access state.
Each multihost disk driver views the machine on which it's running as
the "local host"; each views all other machines as "remote hosts". For each
I/O or ioctl request, the requesting host is the local host.
Note that the non-shared ioctls are designed to work with SCSI-2 disks.
The SCSI-2 RESERVE/RELEASE command set is the underlying hardware facility
in the device that supports the non-shared ioctls.
The function prototypes for the non-shared ioctls are:
|
ioctl(fd, MHIOCTKOWN);
ioctl(fd, MHIOCRELEASE);
ioctl(fd, MHIOCSTATUS);
ioctl(fd, MHIOCQRESERVE);
|
-
MHIOCTKOWN
- Forcefully acquires exclusive access rights
to the multihost disk for the local host. Revokes all access rights to the
multihost disk from remote hosts. Causes the disk to enter the exclusive
access state.
Implementation Note: Reservations (exclusive
access rights) broken via random resets should be reinstated by the driver
upon their detection, for example, in the automatic probe function described
below.
-
MHIOCRELEASE
- Relinquishes exclusive access rights to the multihost
disk for the local host. On success, causes the disk to enter the non- exclusive
access state.
-
MHIOCSTATUS
- Probes a multihost disk to determine whether the local
host has access rights to the disk. Returns 0 if the local
host has access to the disk, 1 if it doesn't, and -1 with errno set to EIO if the probe failed
for some other reason.
-
MHIOCQRESERVE
- Issues, simply and only, a SCSI-2 Reserve command.
If the attempt to reserve fails due to the SCSI error Reservation Conflict
(which implies that some other host has the device reserved), then the ioctl
will return -1 with errno set to EACCES. The MHIOCQRESERVE
ioctl does NOT issue a bus device reset or bus reset prior to attempting the
SCSI-2 reserve command. It also does not take care of re-instating reservations
that disappear due to bus resets or bus device resets; if that behavior is
desired, then the caller can call MHIOCTKOWN after the MHIOCQRESERVE
has returned success. If the device does not support the SCSI-2 Reserve command,
then the ioctl returns -1 with errno
set to ENOTSUP. The MHIOCQRESERVE ioctl is intended to be used by high-availability or clustering
software for a "quorum" disk, hence, the "Q" in the name of the ioctl.
|
Shared Multihost Disks
|
Shared multihost disks ioctls control access to shared multihost
disks. The ioctls are merely a veneer on the SCSI-3 Persistent Reservation
facility. Therefore, the underlying semantic model is not described in detail
here, see instead the SCSI-3 standard. The SCSI-3 Persistent Reservations
support the concept of a group of hosts all sharing access to a disk.
The function prototypes and descriptions for the shared multihost ioctls
are as follows:
-
ioctl(fd, MHIOCGRP_INKEYS, (mhioc_inkeys_t) *k);
- Issues the SCSI-3
command Persistent Reserve In Read Keys to the device. On input, the field k->li should be initialized by the caller with k->li.listsize reflecting how big of an array the caller has allocated for the k->li.list field and with k->li.listlen == 0. On return, the field k->li.listlen is updated to indicate the number of reservation keys the device
currently has: if this value is larger than k->li.listsize
then that indicates that the caller should have passed a bigger k->li.list array with a bigger k->li.listsize.
The number of array elements actually written by the callee into k->li.list is the minimum of k->li.listlen and k->li.listsize. The field k->generation is updated with the generation
information returned by the SCSI-3 Read Keys query. If the device does not
support SCSI-3 Persistent Reservations, then this ioctl returns -1 with errno set to ENOTSUP.
-
ioctl(fd, MHIOCGRP_INRESVS, (mhioc_inresvs_t) *r);
- Issues the SCSI-3
command Persistent Reserve In Read Reservations to the device. Remarks similar
to MHIOCGRP_INKEYS apply to the
array manipulation. If the device does not support SCSI-3 Persistent Reservations,
then this ioctl returns -1 with errno
set to ENOTSUP.
-
ioctl(fd, MHIOCGRP_REGISTER, (mhioc_register_t) *r);
- Issues the SCSI-3
command Persistent Reserve Out Register. The fields of structure r are all inputs; none of the fields are modified by the ioctl.
The field r->aptpl should be set to true to specify that
registrations and reservations should persist across device power failures,
or to false to specify that registrations and reservations should be cleared
upon device power failure; true is the recommended setting. The field r->oldkey is the key that the caller believes the device may already
have for this host initiator; if the caller believes that that this host initiator
is not already registered with this device, it should pass the special key
of all zeros. To achieve the effect of unregistering with the device, the
caller should pass its current key for the r->oldkey field
and an r->newkey field containing the special key of all
zeros. If the device returns the SCSI error code Reservation Conflict, this
ioctl returns -1 with errno set
to EACCES.
-
ioctl(fd, MHIOCGRP_RESERVE, (mhioc_resv_desc_t) *r);
- Issues the SCSI-3
command Persistent Reserve Out Reserve. The fields of structure r are all inputs; none of the fields are modified by the ioctl.
If the device returns the SCSI error code Reservation Conflict, this ioctl
returns -1 with errno set to EACCES.
-
ioctl(fd, MHIOCGRP_PREEMPTANDABORT, (mhioc_preemptandabort_t) *r);
- Issues the SCSI-3
command Persistent Reserve Out Preempt-And-Abort. The fields of structure r are all inputs; inputs; none of the fields are modified by
the ioctl. The key of the victim host is specified by the field r->victim_key. The field r->resvdesc supplies
the preempter's key and the reservation that it is requesting as part of the
SCSI-3 Preempt-And-Abort command. If the device returns the SCSI error code
Reservation Conflict, this ioctl returns -1 with errno set to EACCES.
-
ioctl(fd, MHIOCGRP_PREEMPT, (mhioc_preemptandabort_t) *r);
- Similar to MHIOCGRP_PREEMPTANDABORT, but instead issues the SCSI-3 command
Persistent Reserve Out Preempt.
-
ioctl(fd, MHIOCGRP_CLEAR, (mhioc_resv_key_t) *r);
- Issues the SCSI-3
command Persistent Reserve Out Clear. The input parameter r
is the reservation key of the caller, which should have been already registered
with the device, by an earlier call to MHIOCGRP_REGISTER.
For each device, the non-shared ioctls should not be
mixed with the Persistent Reserve Out shared ioctls, and vice-versa, otherwise,
the underlying device is likely to return errors, because SCSI does not permit
SCSI-2 reservations to be mixed with SCSI-3 reservations on a single device.
It is, however, legitimate to call the Persistent Reserve In ioctls, because
these are query only. Issuing the MHIOCGRP_INKEYS ioctl is the recommended way for a caller to determine if the
device supports SCSI-3 Persistent Reservations (the ioctl will return -1 with errno set to ENOTSUP if the device does not).
|
MHIOCENFAILFAST Ioctl
|
The MHIOCENFAILFAST
ioctl is applicable for both non-shared and shared disks, and may be used
with either the non-shared or shared ioctls.
-
ioctl(fd, MHIOENFAILFAST, (unsigned int *) millisecs);
- Enables
or disables the failfast option in the multihost disk driver and enables or
disables automatic probing of a multihost disk, described below. The argument
is an unsigned integer specifying the number of milliseconds to wait between
executions of the automatic probe function. An argument of zero disables
the failfast option and disables automatic probing. If the MHIOCENFAILFAST ioctl is never called, the effect is defined to be that both the
failfast option and automatic probing are disabled.
|
Automatic Probing
|
The MHIOCENFAILFAST
ioctl sets up a timeout in the driver to periodically schedule automatic probes
of the disk. The automatic probe function works in this manner: The driver
is scheduled to probe the multihost disk every n milliseconds, rounded up
to the next integral multiple of the system clock's resolution. If
- the local host no longer has access rights to the multihost
disk, and
- access rights were expected to be held by the local host,
the driver immediately panics the machine to comply with the failfast
model.
If the driver makes this discovery outside the timeout function, especially
during a read or write operation, it is imperative that it panic the system
then as well.
|
|