ctdb-script.options — CTDB scripts configuration files
Each CTDB script has 2 possible locations for its configuration options:
/usr/local/etc/ctdb/script.options
This is a catch-all global file for general purpose scripts and for options that are used in multiple event scripts.
SCRIPT
.options
That is, options for
are
placed in a file alongside the script, with a ".script"
suffix added. This style is usually recommended for event
scripts.
SCRIPT
Options in this script-specific file override those in the global file.
These files should include simple shell-style variable assignments and shell-style comments.
Event scripts can monitor resources or services. When a problem is detected, it may be better to warn about a problem rather than to immediately fail monitoring and mark a node as unhealthy. CTDB provides support for event scripts to do threshold-based monitoring.
A threshold setting looks like
WARNING_THRESHOLD[:ERROR_THRESHOLD]
.
If the number of problems is ≥ WARNING_THRESHOLD then the
script will log a warning and continue. If the number
problems is ≥ ERROR_THRESHOLD then the script will log an
error and exit with failure, causing monitoring to fail. Note
that ERROR_THRESHOLD is optional, and follows the optional
colon (:) separator.
This event script handles public IP address release and takeover, as well as monitoring interfaces used by public IP addresses.
Whether to use ss -K/--kill to reset incoming TCP connections to public IP addresses during releaseip.
CTDB's standard method of resetting incoming TCP connections during releaseip is via its custom ctdb_killtcp command. This uses network trickery to reset each connection: send a "tickle ACK", capture the reply to extract the TCP sequence number, send a reset (containing the correct sequence number).
ss -K has been supported in
ss since iproute 4.5 in March 2016
and in the Linux kernel since 4.4 in December 2015.
However, the required kernel configuration item
CONFIG_INET_DIAG_DESTROY
is disabled by
default. Although enabled in Debian kernels since ~2017
and in Ubuntu since at least 18.04, this has only
recently been enabled in distributions such as RHEL.
There seems to be no way, including running ss
-K, to determine if this is supported, so use
of this feature needs to be configurable. When
available, it should be the fastest, most reliable way
of killing connections.
Supported values are:
Use ss -K and make no other
attempt to kill any remaining connections. This
is sane on modern Linux distributions that are
guaranteed to have
CONFIG_INET_DIAG_DESTROY
enabled.
Attempt to use ss -K and fall
back to ctdb_killtcp for any
remaining connections. This may be a good value
when ss supports the
-K option but it is uncertain
whether CONFIG_INET_DIAG_DESTROY
is
enabled.
Never attempt to use ss -K. Rely only on ctdb_killtcp.
Default is "no".
Whether one or more offline interfaces should cause a monitor event to fail if there are other interfaces that are up. If this is "yes" and a node has some interfaces that are down then ctdb status will display the node as "PARTIALLYONLINE".
Note that CTDB_PARTIALLY_ONLINE_INTERFACES=yes is not generally compatible with NAT gateway or LVS. NAT gateway relies on the interface configured by CTDB_NATGW_PUBLIC_IFACE to be up and LVS replies on CTDB_LVS_PUBLIC_IFACE to be up. CTDB does not check if these options are set in an incompatible way so care is needed to understand the interaction.
Default is "no".
Provides CTDB's NAT gateway functionality.
NAT gateway is used to configure fallback routing for nodes when they do not host any public IP addresses. For example, it allows unhealthy nodes to reliably communicate with external infrastructure. One node in a NAT gateway group will be designated as the NAT gateway leader node and other (follower) nodes will be configured with fallback routes via the NAT gateway leader node. For more information, see the NAT GATEWAY section in ctdb(7).
IPADDR
IPADDR is an alternate network gateway to use on the NAT gateway leader node. If set, a fallback default route is added via this network gateway.
No default. Setting this variable is optional - if not set that no route is created on the NAT gateway leader node.
FILENAME
FILENAME contains the list of nodes that belong to the same NAT gateway group.
File format:
IPADDR
[follower-only]
IPADDR is the private IP address of each node in the NAT gateway group.
If "follower-only" is specified then the corresponding node
can not be the NAT gateway leader node. In this case
CTDB_NATGW_PUBLIC_IFACE
and
CTDB_NATGW_PUBLIC_IP
are optional and
unused.
No default, usually
/usr/local/etc/ctdb/natgw_nodes
when enabled.
IPADDR/MASK
IPADDR/MASK is the private sub-network that is internally routed via the NAT gateway leader node. This is usually the private network that is used for node addresses.
No default.
IFACE
IFACE is the network interface on which the CTDB_NATGW_PUBLIC_IP will be configured.
No default.
IPADDR/MASK
IPADDR/MASK indicates the IP address that is used for outgoing traffic (originating from CTDB_NATGW_PRIVATE_NETWORK) on the NAT gateway leader node. This must not be a configured public IP address.
No default.
IPADDR/MASK[@GATEWAY]
...Each IPADDR/MASK identifies a network or host to which NATGW should create a fallback route, instead of creating a single default route. This can be used when there is already a default route, via an interface that can not reach required infrastructure, that overrides the NAT gateway default route.
If GATEWAY is specified then the corresponding route on
the NATGW leader node will be via GATEWAY. Such routes
are created even if
CTDB_NATGW_DEFAULT_GATEWAY
is not
specified. If GATEWAY is not specified for some
networks then routes are only created on the NATGW
leader node for those networks if
CTDB_NATGW_DEFAULT_GATEWAY
is
specified.
This should be used with care to avoid causing traffic to unnecessarily double-hop through the NAT gateway leader, even when a node is hosting public IP addresses. Each specified network or host should probably have a corresponding automatically created link route or static route to avoid this.
No default.
CTDB_NATGW_NODES=/usr/local/etc/ctdb/natgw_nodes CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24 CTDB_NATGW_DEFAULT_GATEWAY=10.0.0.1 CTDB_NATGW_PUBLIC_IP=10.0.0.227/24 CTDB_NATGW_PUBLIC_IFACE=eth0
A variation that ensures that infrastructure (ADS, DNS, ...) directly attached to the public network (10.0.0.0/24) is always reachable would look like this:
CTDB_NATGW_NODES=/usr/local/etc/ctdb/natgw_nodes CTDB_NATGW_PRIVATE_NETWORK=192.168.1.0/24 CTDB_NATGW_PUBLIC_IP=10.0.0.227/24 CTDB_NATGW_PUBLIC_IFACE=eth0 CTDB_NATGW_STATIC_ROUTES=10.0.0.0/24
Note that CTDB_NATGW_DEFAULT_GATEWAY
is
not specified.
Provides CTDB's policy routing functionality.
A node running CTDB may be a component of a complex network
topology. In particular, public addresses may be spread
across several different networks (or VLANs) and it may not be
possible to route packets from these public addresses via the
system's default route. Therefore, CTDB has support for
policy routing via the 13.per_ip_routing
eventscript. This allows routing to be specified for packets
sourced from each public address. The routes are added and
removed as CTDB moves public addresses between nodes.
For more information, see the POLICY ROUTING section in ctdb(7).
FILENAME
FILENAME contains elements for constructing the desired routes for each source address.
The special FILENAME value
__auto_link_local__
indicates that no
configuration file is provided and that CTDB should
generate reasonable link-local routes for each public IP
address.
File format:
IPADDR
DEST-IPADDR/MASK
[GATEWAY-IPADDR
]
No default, usually
/usr/local/etc/ctdb/policy_routing
when enabled.
NUM
NUM sets the priority (or preference) for the routing rules that are added by CTDB.
This should be (strictly) greater than 0 and (strictly) less than 32766. A priority of 100 is recommended, unless this conflicts with a priority already in use on the system. See ip(8), for more details.
LOW-NUM
,
CTDB_PER_IP_ROUTING_TABLE_ID_HIGH=HIGH-NUM
CTDB determines a unique routing table number to use for the routing related to each public address. LOW-NUM and HIGH-NUM indicate the minimum and maximum routing table numbers that are used.
ip(8) uses some reserved routing table numbers below 255. Therefore, CTDB_PER_IP_ROUTING_TABLE_ID_LOW should be (strictly) greater than 255.
CTDB uses the standard file
/etc/iproute2/rt_tables
to maintain
a mapping between the routing table numbers and labels.
The label for a public address
ADDR
will look like
ctdb.addr
. This means that
the associated rules and routes are easy to read (and
manipulate).
No default, usually 1000 and 9000.
Provides CTDB's LVS functionality.
For a general description see the LVS section in ctdb(7).
FILENAME
FILENAME contains the list of nodes that belong to the same LVS group.
File format:
IPADDR
[follower-only]
IPADDR is the private IP address of each node in the LVS group.
If "follower-only" is specified then the corresponding node
can not be the LVS leader node. In this case
CTDB_LVS_PUBLIC_IFACE
and
CTDB_LVS_PUBLIC_IP
are optional and
unused.
No default, usually
/usr/local/etc/ctdb/lvs_nodes
when enabled.
INTERFACE
INTERFACE is the network interface that clients will use
to connection to CTDB_LVS_PUBLIC_IP
.
This is optional for follower-only nodes.
No default.
IPADDR
CTDB_LVS_PUBLIC_IP is the LVS public address. No default.
CTDB can be configured to manage and/or monitor various NAS (and other) services via its eventscripts.
In the simplest case CTDB will manage a service. This means the service will be started and stopped along with CTDB, CTDB will monitor the service and CTDB will do any required reconfiguration of the service when public IP addresses are failed over.
Provides CTDB's Linux multipathd service management.
It can monitor multipath devices to ensure that active paths are available.
MP-DEVICE-LIST
MP-DEVICE-LIST is a list of multipath devices for CTDB to monitor?
No default.
This event script provide CTDB's ClamAV anti-virus service management.
This eventscript is not enabled by default. Use ctdb enablescript to enable it.
FILENAME
FILENAME is the socket to monitor ClamAV.
No default.
Provides CTDB's vsftpd service management.
THRESHOLDS
THRESHOLDS indicates how many consecutive monitoring attempts need to report that vsftpd is not listening on TCP port 21 before a warning is logged and before monitoring fails. See the Monitoring Thresholds for a description of how monitoring thresholds work.
Default is 1:2.
Provides CTDB's NetBIOS service management.
SERVICE
Distribution specific SERVICE for managing nmbd.
Default is distribution-dependant.
Provides CTDB's Samba winbind service management.
SERVICE
Distribution specific SERVICE for managing winbindd.
Default is "winbind".
FILENAME
Generates FILENAME, containing an smb.conf snippet with an interfaces setting that includes interfaces for configured CTDB public IP addresses. This file then needs to be explicitly included in smb.conf.
For example, if public IP addresses are defined on
interfaces eth0 and eth1, and this is set to
/etc/samba/interfaces.conf
, then
that file will contain the following before smbd is
started:
bind interfaces only = yes interfaces = lo eth0 eth1
This can be useful for limiting the interfaces used by SMB multichannel.
Default is to not generate a file.
INTERFACE-LIST
A space separated list to provide additional interfaces to bind.
Default is empty - no extra interfaces are added.
Provides the core of CTDB's Samba file service management.
PORT-LIST
When monitoring Samba, check TCP ports in space-separated PORT-LIST.
Default is to monitor ports that Samba is configured to listen on.
As part of monitoring, should CTDB skip the check for the existence of each directory configured as share in Samba. This may be desirable if there is a large number of shares.
Default is no.
SERVICE
Distribution specific SERVICE for managing smbd.
Default is distribution-dependant.
This event script provides CTDB's NFS service management.
This includes parameters for the kernel NFS server.
Alternative NFS subsystems (such as NFS-Ganesha)
can be integrated using CTDB_NFS_CALLOUT
.
COMMAND
COMMAND specifies the path to a callout to handle interactions with the configured NFS system, including startup, shutdown, monitoring.
Default is the included nfs-linux-kernel-callout.
DIRECTORY
Specifies the path to a DIRECTORY containing files that describe how to monitor the responsiveness of NFS RPC services. See the README file for this directory for an explanation of the contents of these "check" files.
CTDB_NFS_CHECKS_DIR can be used to point to different sets of checks for different NFS servers.
One way of using this is to have it point to, say,
/usr/local/etc/ctdb/nfs-checks-enabled.d
and populate it with symbolic links to the desired check
files. This avoids duplication and is upgrade-safe.
Default is
/usr/local/etc/ctdb/nfs-checks.d
,
which contains NFS RPC checks suitable for Linux kernel
NFS.
FILE
Set FILE as the path of the file containing NFS exports, for use by the NFS callout (see CTDB_NFS_CALLOUT, above). This is used for share checks when CTDB_NFS_SKIP_SHARE_CHECK is not set to "yes". This is most useful with NFS-Ganesha, since it supports configuration include files and exports may be stored in a separate file.
Default is /var/lib/nfs/etab
for
nfs-linux-kernel-callout
,
/etc/ganesha/ganesha.conf
for
nfs-ganesha-callout
.
DIRECTORY
DIRECTORY where clustered NFS shared state will be located. DIRECTORY should be in a cluster filesystem that is shared between the nodes. No default.
As part of monitoring, should CTDB skip the check for the existence of each directory exported via NFS. This may be desirable if there is a large number of exports.
Default is no.
IPADDR
|HOSTNAME
IPADDR or HOSTNAME indicates the address that rpcinfo should connect to when doing rpcinfo check on IPv4 RPC service during monitoring. Optimally this would be "localhost". However, this can add some performance overheads.
Default is "127.0.0.1".
IPADDR
|HOSTNAME
IPADDR or HOSTNAME indicates the address that rpcinfo should connect to when doing rpcinfo check on IPv6 RPC service during monitoring. Optimally this would be "localhost6" (or similar). However, this can add some performance overheads.
Default is "::1".
LOCATION
LOCATION where NFSv3 statd state will be stored. Valid values are:
TDB
]
Data is queued to local storage and then dequeued to TDB during monitor events. This means there is a window where locking state may be lost. However, this works around performance limitations in CTDB's persistent database handling.
If :TDB is omitted then TDB defaults to
ctdb_statd_callout.tdb
.
DIRECTORY
]
DIRECTORY is a directory in a cluster filesystem
that is shared between the nodes. If DIRECTORY is
relative (i.e. does not start with '/') then it is
appended to CTDB_NFS_SHARED_STATE_DIR. If
:DIRECTORY is omitted then DIRECTORY defaults to
statd
.
Using a shared directory may result in performance and/or stability problems. rpc.statd is single-threaded and its HA callout is called synchronously, causing any latency introduced by the callout to be cumulative. Stability issues are most likely if thousands of clients reclaim locks after failover and use of the cluster filesystem introduces too much additional latency. Too much latency in in the HA callout may cause rpc.statd to fail health monitoring.
No cluster-aware handling of NFSv3 statd state is done. NFSv3 lock reclaim will not occur and applications that use locking over NFSv3 are likely to lose or corrupt data.
This should be used with care and only in the case
where no applications are using POSIX locks in
NFSv3 mounts. It should probably be considered an
option to test the latency of
statd_callout
, without
including any storage costs.
CTDB checks the consistency of databases during startup and provides a facility to backup persistent databases.
NUM
NUM is the maximum number of volatile TDB database backups to be kept (for each database) when a corrupt database is found during startup. Volatile TDBs are zeroed during startup so backups are needed to debug any corruption that occurs before a restart.
Default is 10.
DIRECTORY
Create a daily backup tarball for all persistent TDBs in DIRECTORY. Note that DIRECTORY must exist or no backups will be created.
Given that persistent databases are fully replicated, duplication is avoid by only creating backups on the current leader node. To maintain a complete, single set of backups, it makes sense for DIRECTORY to be in a cluster filesystem.
This creates the backup from the
monitor event, which should be fine
because backing up persistent databases is a local
operation. Users who do not wish do create backups
during the monitor event can choose
not to use this option and instead run
/usr/local/etc/ctdb/ctdb-backup-persistent-tdbs.sh
-l DIRECTORY
on all
nodes using a
cron(8) job, which
will also need to manually manage backup pruning.
No default. No daily backups are created.
COUNT
Keep at most COUNT backups in CTDB_PERSISTENT_DB_BACKUP_DIR. Note that if additional manual backups are created in this directory then these will count towards the limit.
Default is 14.
Provides CTDB's filesystem and memory usage monitoring.
CTDB can experience seemingly random (performance and other) issues if system resources become too constrained. Options in this section can be enabled to allow certain system resources to be checked. They allows warnings to be logged and nodes to be marked unhealthy when system resource usage reaches the configured thresholds.
Some checks are enabled by default. It is recommended that these checks remain enabled or are augmented by extra checks. There is no supported way of completely disabling the checks.
FS-LIMIT-LIST
FS-LIMIT-LIST is a space-separated list of
FILESYSTEM
:WARN_LIMIT
[:UNHEALTHY_LIMIT
]
triples indicating that warnings should be logged if the
space used on FILESYSTEM reaches WARN_LIMIT%. If usage
reaches UNHEALTHY_LIMIT then the node should be flagged
unhealthy. Either WARN_LIMIT or UNHEALTHY_LIMIT may be
left blank, meaning that check will be omitted.
Default is to warn for each filesystem containing a
database directory
(volatile database directory
,
persistent database directory
,
state database directory
)
with a threshold of 90%.
MEM-LIMITS
MEM-LIMITS takes the form
WARN_LIMIT
[:UNHEALTHY_LIMIT
]
indicating that warnings should be logged if memory
usage reaches WARN_LIMIT%. If usage reaches
UNHEALTHY_LIMIT then the node should be flagged
unhealthy. Either WARN_LIMIT or UNHEALTHY_LIMIT may be
left blank, meaning that check will be omitted.
Default is 80, so warnings will be logged when memory usage reaches 80%.
REGEXP
REGEXP specifies interesting processes for which stack traces should be logged when debugging hung eventscripts and those processes are matched in pstree output. REGEXP is an extended regexp so choices are separated by pipes ('|'). However, REGEXP should not contain parentheses. See also the ctdb.conf(5) [event] "debug script" option.
Default is "exportfs|rpcinfo".