https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/index.html
Pgpool-II
Pgpool-II is a proxy server sitting between clients and PostgreSQL. Pgpool-II understands the wire level protocol used by PostgreSQL called “frontend and backend protocol”. For more details of the protocol, see the PostgreSQL manual. No modified PostgreSQL is required to use Pgpool-II (more precisely, you will need a few extensions to use full functions of Pgpool-II). So Pgpool-II can cope with variety of PostgreSQL versions. In theory, even the earliest version of PostgreSQL can be used with Pgpool-II. Same thing can be said to client side. As long as it follows the protocol, Pgpool-II happily accept connections from it, no matter what kind of languages or drivers it uses.
Pgpool-II consists of multiple process. There is a main process, which is the parent process of all other process. It is responsible for forking child process each of which accepts connections from clients. There are some worker process those are forked from the main process as well, which is responsible for detecting streaming replication delay. There is also a special process called “pcp process”, which is solely used for management of Pgpool-II itself. Pgpool-II has a built-in high availability function called “watchdog”. Watchdog consists of some process.
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/tutorial-arch.html
It provides the following features:
Connection Pooling Pgpool-II maintains established connections to the PostgreSQL servers, and reuses them whenever a new connection with the same properties (i.e. user name, database, protocol version, and other connection parameters if any) comes in. It reduces the connection overhead, and improves system’s overall throughput.
Load Balancing If a database is replicated (because running in either replication mode or master/slave mode), performing a SELECT query on any server will return the same result. Pgpool-II takes advantage of the replication feature in order to reduce the load on each PostgreSQL server. It does that by distributing SELECT queries among available servers, improving the system’s overall throughput. In an ideal scenario, read performance could improve proportionally to the number of PostgreSQL servers. Load balancing works best in a scenario where there are a lot of users executing many read-only queries at the same time.
Automated fail over If one of the database servers goes down or becomes unreachable, Pgpool-II will detach it and will continue operations by using the rest of database servers. There are some sophisticated features that help the automated failover including timeouts and retries.
Online Recovery Pgpool-II can perform online recovery of database node by executing one command. When the online recovery is used with the automated fail over, a detached node by fail over is possible to attach as standby node automatically. It is possible to synchronize and attach new PostgreSQL server too.
Replication Pgpool-II can manage multiple PostgreSQL servers. Activating the replication feature makes it possible to create a real time backup on 2 or more PostgreSQL clusters, so that the service can continue without interruption if one of those clusters fails. Pgpool-II has built-in replication (native replication). However user can use external replication features including streaming replication of PostgreSQL.
Limiting Exceeding Connections There is a limit on the maximum number of concurrent connections with PostgreSQL, and new connections are rejected when this number is reached. Raising this maximum number of connections, however, increases resource consumption and has a negative impact on overall system performance. Pgpool-II also has a limit on the maximum number of connections, but extra connections will be queued instead of returning an error immediately. However, you can configure to return an error when the connection limit is exceeded (4.1 or later).
Watchdog Watchdog can coordinate multiple Pgpool-II, create a robust cluster system and avoid the single point of failure or split brain. Watchdog can perform lifecheck against other pgpool-II nodes, to detect a fault of Pgpoll-II. If active Pgpool-II goes down, standby Pgpool-II can be promoted to active, and take over Virtual IP.
In Memory Query Cache In memory query cache allows to save a pair of SELECT statement and its result. If an identical SELECTs comes in, Pgpool-II returns the value from cache. Since no SQL parsing nor access to PostgreSQL are involved, using in memory cache is extremely fast. On the other hand, it might be slower than the normal path in some cases, because it adds some overhead of storing cache data.
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/restrictions.html
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/admin.html
install from source or rpm
build from source: https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/install-requirements.html
install from rpm:
yum install http://www.pgpool.net/yum/rpms/4.0/redhat/rhel-7-x86_64/pgpool-II-release-4.0-1.noarch.rpm
yum install pgpool-II-pg11
rpm direct download: https://yum.postgresql.org/rpmchart/
pg11 means PostgreSQL 11. Pgpool-II needs PostgreSQL's library and extensions directory. Since the directory paths are different in the particular PostgreSQL versions, You must choose appropriate RPM for your PostgreSQL rpm installation. We also assume you are using PostgreSQL community rpms. Optionally you can install:
yum install pgpool-II-pg11-debuginfo
which makes it easier to retrieve debugging symbols from the core or the backtrace. We recommend to install it. There is an optional package for developers.
yum install pgpool-II-pg11-devel
This installs header files which developers are interested in
On all the PostgreSQL servers you need to install:
yum install pgpool-II-pg11-extensions
systemctl enable pgpool.service
systemctl start pgpool.service
systemctl stop pgpool.service
firewall
#allow to access port that Pgpool-II use.
firewall-cmd --permanent --zone=public --add-port=9999/tcp --add-port=9898/tcp --add-port=9000/tcp --add-port=9694/tcp
firewall-cmd --permanent --zone=public --add-port=9999/udp --add-port=9898/udp --add-port=9000/udp --add-port=9694/udp
firewall-cmd --reload
#access to PostgreSQL
firewall-cmd --permanent --zone=public --add-service=postgresql
firewall-cmd --reload
(optional) online recovery https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/install-pgpool-recovery.html
(optional) native replication mode https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/create-installlock-table.html
(optional) Compiling and installing documents https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/install-docs.html
create The Pgpool-II User Account As with any server daemon that is accessible to the outside world, it is advisable to run Pgpool-II under a separate user account. This user account should only own the data that is managed by the server, and should not be shared with other daemons. (For example, using the user nobody is a bad idea.) It is not advisable to install executables owned by this user because compromised systems could then modify their own binaries.
useradd -m -d /home/pgpool pgpool
Configuring pcp.conf
Pgpool-II provides a interface for administrators to perform management operation, such as getting Pgpool-II status or terminating Pgpool-II processes remotely. pcp.conf is the user/password file used for authentication by this interface.
$ pg_md5 your_password
$ cp /etc/pgpool-II/pcp.conf.sample /etc/pgpool-II/pcp.conf
username:[md5 encrypted password]
Configuring Pgpool-II
pgpool.conf is the main configuration file of Pgpool-II. You need to specify the path to the file when starting Pgpool-II using -f option.
There are four different running modes in Pgpool-II: streaming replication mode, logical replication mode, master slave mode (slony mode), native replication mode and raw mode. In any mode, Pgpool-II provides connection pooling, automatic fail over and online recovery.
pgpool.conf samples: | Operation mode | Configuration file name | | —- | —- | | Streaming replication mode | - | | Replication mode | pgpool.conf.sample-replication | Master slave mode | - | | Raw mode | pgpool.conf.sample | | Logical replication mode | pgpool.conf.sample-logical |
cp /etc/pgpool-II/pgpool.conf.sample-stream /etc/pgpool-II/pgpool.conf
https://www.pgpool.net/docs/pgpool-II-4.1.1/en/html/configuring-pgpool.html
Configuring backend information For Pgpool-II to recognize PostgreSQL backend servers, you need to configure backend* in pgpool.conf. For starters, at least backend_hostname and backend_port parameters are required to be set up to start Pgpool-II server.
/etc/pgpool-II/pgpool.conf
https://www.pgpool.net/docs/pgpool-II-4.1.1/en/html/runtime-config-backend-settings.html
Watchdog is a sub process of Pgpool-II to add high availability. Watchdog is used to resolve the single point of failure by coordinating multiple pgpool-II nodes. To ensure the quorum mechanism properly works, the number of pgpool-II nodes must be odd in number and greater than or equal to 3.
https://www.pgpool.net/docs/pgpool-II-4.1.1/en/html/tutorial-advanced-arch.html
Watchdog also coordinates with all connected Pgpool-II nodes to ensure that failback, failover and follow_master commands must be executed only on one pgpool-II node.
Life checking of other Pgpool-II nodes Watchdog lifecheck is the sub-component of watchdog to monitor the health of Pgpool-II nodes participating in the watchdog cluster to provide the high availability. Traditionally Pgpool-II watchdog provides two methods of remote node health checking. “heartbeat” and “query” mode. Apart from remote node health checking watchdog lifecheck can also check the health of node it is installed on by monitoring the connection to upstream servers. If the monitoring fails, watchdog treats it as the local Pgpool-II node failure.
In heartbeat mode, watchdog monitors other Pgpool-II processes by using heartbeat signal. Watchdog receives heartbeat signals sent by other Pgpool-II periodically. If there is no signal for a certain period, watchdog regards this as the failure of the Pgpool-II. For redundancy you can use multiple network connections for heartbeat exchange between Pgpool-II nodes. This is the default and recommended mode to be used for health checking.
In query mode, watchdog monitors Pgpool-II service rather than process. In this mode watchdog sends queries to other Pgpool-II and checks the response.
external mode is introduced by Pgpool-II V3.5. This mode basically disables the built in lifecheck of Pgpool-II watchdog and expects that the external system will inform the watchdog about health of local and all remote nodes participating in the watchdog cluster.
Consistency of configuration parameters on all Pgpool-II nodes At startup watchdog verifies the Pgpool-II configuration of the local node for the consistency with the configurations on the master watchdog node and warns the user of any differences.
Changing active/standby state when certain fault is detected When a fault of Pgpool-II is detected, watchdog notifies the other watchdogs of it. If this is the active Pgpool-II, watchdogs decide the new active Pgpool-II by voting and change active/standby state.
Automatic virtual IP switching When a standby Pgpool-II server promotes to active, the new active server brings up virtual IP interface. Meanwhile, the previous active server brings down the virtual IP interface.
Automatic registration of a server as a standby in recovery When the broken server recovers or new server is attached, the watchdog process notifies this to the other watchdogs in the cluster along with the information of the new server, and the watchdog process receives information on the active server and other servers. Then, the attached server is registered as a standby.
https://www.pgpool.net/docs/pgpool-II-4.1.1/en/html/tutorial-watchdog-restrictions.html
The watchdog process starts and stops automatically as sub-processes of the Pgpool-II, therefore there is no dedicated command to start and stop watchdog.
Watchdog controls the virtual IP interface, the commands executed by the watchdog for bringing up and bringing down the VIP require the root privileges. Pgpool-II requires the user running Pgpool-II to have root privileges when the watchdog is enabled along with virtual IP. This is however not good security practice to run the Pgpool-II as root user, the alternative and preferred way is to run the Pgpool-II as normal user and use either the custom commands for if_up_cmd, if_down_cmd, and arping_cmd using sudo or use setuid (“set user ID upon execution”) on if_* commands
Lifecheck process is a sub-component of watchdog, its job is to monitor the health of Pgpool-II nodes participating in the watchdog cluster. The Lifecheck process is started automatically when the watchdog is configured to use the built-in life-checking, it starts after the watchdog main process initialization is complete. However lifecheck process only kicks in when all configured watchdog nodes join the cluster and becomes active. If some remote node fails before the Lifecheck become active that failure will not get caught by the lifecheck.
https://www.pgpool.net/docs/pgpool-II-4.1.1/en/html/tutorial-watchdog-integrating-external-lifecheck.html
https://www.pgpool.net/docs/pgpool-II-4.1.1/en/html/runtime-config.html
Setting Parameters
Connections and Authentication
Running mode
Backend Settings
Connection Pooling
Error Reporting and Logging
Load Balancing
Health Check
Failover and Failback
Online Recovery
Streaming Replication Check
In Memory Query Cache
Secure Sockect Layer (SSL)
Watchdog
Misc Configuration Parameters
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/client-authentication.html
The pool_hba.conf File
Authentication Methods
Using different methods for frontend and backend authentication
Using AES256 encrypted passwords in pool_passwd
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/performance.html
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/example-configs.html
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/example-basic.html https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/example-watchdog.html
$ ./configure
$ make
$ make install
configure script collects your system information and use it for the compilation procedure. You can pass command line arguments to configure script to change the default behavior, such as the installation directory. Pgpool-II will be installed to /usr/local(if installed from rpm default path: /etc/pgpool-II/) directory by default.
make command compiles the source code, and make install will install the executables. You must have write permission on the installation directory.
//Configuration Files
$ cp /usr/local/etc/pgpool.conf.sample /usr/local/etc/pgpool.conf
listen_addresses = '*'
port = 9999
//Configuring PCP Commands
$ /usr/local/bin/pg_md5 postgres
e8a48653851e28c69d0506508fb27fc5
$ cp /usr/local/etc/pcp.conf.sample /usr/local/etc/pcp.conf
postgres:e8a48653851e28c69d0506508fb27fc5
//Preparing Database Nodes
$ vim /etc/pgpool-II/pgpool.conf
pcp_port = 9898
backend_hostname0 = 'localhost'
backend_port0 = 5432
backend_weight0 = 1
backend_hostname1 = 'localhost'
backend_port1 = 5433
backend_weight1 = 1
backend_hostname2 = 'localhost'
backend_port2 = 5434
backend_weight2 = 1
$ pgpool
The above command, however, prints no log messages because Pgpool-II detaches the terminal. If you want to show Pgpool-II log messages, you pass -n option to pgpool command so Pgpool-II is executed as non-daemon process, and the terminal will not be detached.
$ pgpool -n &
The log messages are printed on the terminal, so it is recommended to use the following options.
$ pgpool -n -d > /tmp/pgpool.log 2>&1 &
to rotate log files:
$ pgpool -n 2>&1 | /usr/local/apache2/bin/rotatelogs \
-l -f /var/log/pgpool/pgpool.log.%A 86400 &
To delete old log files before rotation, you could use cron:
55 23 * * * /usr/bin/find /var/log/pgpool -type f -mtime +5 -exec /bin/rm -f '{}' \;
Please note that rotatelogs may exist as /usr/sbin/rotatelogs2 in some distributions. -f option generates a log file as soon as rotatelogs starts and is available in apache2 2.2.9 or greater. Also cronolog can be used.
$ pgpool -n 2>&1 | /usr/sbin/cronolog \
--hardlink=/var/log/pgsql/pgpool.log \
'/var/log/pgsql/%Y-%m-%d-pgpool.log' &
$ pgpool stop
If any client is still connected, Pgpool-II waits for it to disconnect, and then terminates itself. Run the following command instead if you want to shutdown Pgpool-II forcibly.
$ pgpool -m fast stop
we’ll use three database nodes,
$ vim /usr/local/etc/pgpool.conf
//When replication_mode is on, Pgpool-II will send a copy of a received query to all the database nodes.
replication_mode = true
//when load_balance_mode is set to true, Pgpool-II will distribute SELECT queries among the database nodes.
load_balance_mode = true
//restart pgpool
//create a database to be replicated,
//Use the createdb commands through Pgpool-II, and the database will be created on all the nodes.
$ createdb -p 9999 bench_replication
//initializes the database with pre-defined tables and data
$ pgbench -i -p 9999 bench_replication
//If, on all the nodes, the tables and data are created, replication is working correctly.
$ for port in 5432 5433 5434; do
> echo $port
> for table_name in pgbench_branches pgbench_tellers pgbench_accounts pgbench_history; do
> echo $table_name
> psql -c "SELECT count(*) FROM $table_name" -p $port bench_replication
> done
> done
What you need is 2 Linux boxes on which Pgpool-II is installed and a PostgreSQL on the same machine or in the other one. It is enough that 1 node for backend exists. You can use watchdog with Pgpool-II in any mode: replication mode, master/slave mode and raw mode. This example uses use “osspc16” as an Active node and “osspc20” as a Standby node. “Someserver” means one of them.
Common Config for all Pgpool-II
$ vim /usr/local/etc/pgpool.conf
//Enabling watchdog
use_watchdog = on
//Configure Up stream servers
trusted_servers = ''
# trusted server list which are used
# to confirm network connection
# (hostA,hostB,hostC,...)
//Watchdog Communication
wd_port = 9000
//Specify the IP address to be used as a virtual IP address in the delegate_IP
delegate_IP = '133.137.177.143'
Individual Configurations for each Pgpool-II
//FOR Active (osspc16) Server configurations
other_pgpool_hostname0 = 'osspc20'
# Host name or IP address to connect to for other pgpool 0
other_pgpool_port0 = 9999
# Port number for other pgpool 0
other_wd_port0 = 9000
# Port number for other watchdog 0
//FOR Standby (osspc20) Server configurations
other_pgpool_hostname0 = 'osspc16'
# Host name or IP address to connect to for other pgpool 0
other_pgpool_port0 = 9999
# Port number for other pgpool 0
other_wd_port0 = 9000
# Port number for other watchdog 0
First start the Pgpool-II on Active server.
[user@osspc16]$ su -
[root@osspc16]# {installed_dir}/bin/pgpool -n -f {installed_dir}/etc/pgpool.conf > pgpool.log 2>&1
Log messages will show that Pgpool-II has the virtual IP address and starts watchdog process:
LOG: I am announcing my self as master/coordinator watchdog node
LOG: I am the cluster leader node
DETAIL: our declare coordinator message is accepted by all nodes
LOG: I am the cluster leader node. Starting escalation process
LOG: escalation process started with PID:59449
LOG: watchdog process is initialized
LOG: watchdog: escalation started
LOG: I am the master watchdog node
DETAIL: using the local backend node status
Starting pgpool in Standby server (osspc20)
[user@osspc20]$ su -
[root@osspc20]# {installed_dir}/bin/pgpool -n -f {installed_dir}/etc/pgpool.conf > pgpool.log 2>&1
Log messages will show that Pgpool-II has joind the watchdog cluster as standby watchdog.
LOG: watchdog cluster configured with 1 remote nodes
LOG: watchdog remote node:0 on Linux_osspc16_9000:9000
LOG: interface monitoring is disabled in watchdog
LOG: IPC socket path: "/tmp/.s.PGPOOLWD_CMD.9000"
LOG: watchdog node state changed from [DEAD] to [LOADING]
LOG: new outbound connection to Linux_osspc16_9000:9000
LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
LOG: successfully joined the watchdog cluster as standby node
DETAIL: our join coordinator request is accepted by cluster leader node "Linux_osspc16_9000"
LOG: watchdog process is initialized
Try it out
//Confirm to ping to the virtual IP address.
[user@someserver]$ ping 133.137.177.142
//Confirm if the Active server which started at first has the virtual IP address.
[root@osspc16]# ifconfig
//Confirm if the Standby server which started not at first doesn't have the virtual IP address.
[root@osspc20]# ifconfig
//Try to connect PostgreSQL by "psql -h delegate_IP -p port".
[user@someserver]$ psql -h 133.137.177.142 -p 9999 -l
Switching virtual IP
//Confirm how the Standby server works when the Active server can't provide its service. Stop Pgpool-II on the Active server.
[root@osspc16]# {installed_dir}/bin/pgpool stop
Then, the Standby server starts to use the virtual IP address
More: Lifecheck
There are the parameters about watchdog's monitoring. Specify the interval to check wd_interval and the type of lifecheck wd_lifecheck_method. The hearbeat method specify the time to detect a fault wd_heartbeat_deadtime, the port number to receive wd_heartbeat_port, the interval to send wd_heartbeat_keepalive, the IP address or hostname of destination heartbeat_destination<emphasis>0</emphasis> and finally the destination port number heartbeat_destination_port<emphasis>0</emphasis>.
wd_lifecheck_method = 'heartbeat'
# Method of watchdog lifecheck ('heartbeat' or 'query' or 'external')
# (change requires restart)
wd_interval = 10
# lifecheck interval (sec) > 0
wd_heartbeat_port = 9694
# Port number for receiving heartbeat signal
# (change requires restart)
wd_heartbeat_keepalive = 2
# Interval time of sending heartbeat signal (sec)
# (change requires restart)
wd_heartbeat_deadtime = 30
# Deadtime interval for heartbeat signal (sec)
# (change requires restart)
heartbeat_destination0 = 'host0_ip1'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
More: Switching virtual IP address
There are the parameters for switching the virtual IP address. Specify switching commands if_up_cmd, if_down_cmd, the path to them if_cmd_path, the command executed after switching to send ARP request arping_cmd and the path to it arping_path.
if_cmd_path = '/sbin'
# path to the directory where if_up/down_cmd exists
if_up_cmd = 'ip addr add $_IP_$/24 dev eth0 label eth0:0'
# startup delegate IP command
if_down_cmd = 'ip addr del $_IP_$/24 dev eth0'
# shutdown delegate IP command
arping_path = '/usr/sbin' # arping command path
arping_cmd = 'arping -U $_IP_$ -w 1'
+ Method 2:
In this example, we use 3 Pgpool-II servers to manage PostgreSQL servers to create a robust cluster system and avoid the single point of failure or split brain.
![](/docs/docs_image/software/postgresql/pgpool-ii+watchdog_cluster_example.gif)
**REQUIREMENT:**
We assume that all the Pgpool-II servers and the PostgreSQL servers are in the same subnet.
We use 3 servers with CentOS 7.4. Let these servers be server1, server2, server3. We install PostgreSQL and Pgpool-II on each server.
#### 2.2.1 Stepup postgresql server (data nodes)
##### 2.2.1.1 Install PostgreSQL by using PostgreSQL YUM repository.
mkdir -p /lyhistory/workspace/postgres chown -R postgres:postgres /lyhistory/workspace/postgres chmod 750 /lyhistory/workspace/postgres
vi /lib/systemd/system/postgresql-12.service Environment=PGDATA=/lyhistory/workspace/postgres/data (default: /var/lib/pgsql/12/data/)
cd /lyhistory/workspace/postgres /usr/pgsql-12/bin/postgresql-12-setup initdb
systemctl enable postgresql-12
vi /lyhistory/workspace/postgres/data/pg_hba.conf
local all all peer
host all all 127.0.0.1/32 ident
host all all ::1/128 ident
local replication all peer host replication all 127.0.0.1/32 ident host replication all ::1/128 ident host all all 172.16.0.0/16 md5 host all all 10.36.100.0/24 md5 host all all 192.168.0.0/16 md5
vi data/postgresql.conf listen_addresses = ‘*’
systemctl start postgresql-12 vi data/log/postgresql-Fri.log
su - postgres
##### 2.2.1.2 Set up PostgreSQL streaming replication on the primary server.
//First, we create the directory /var/lib/pgsql/archivedir to store WAL segments on all servers. In this example, only Primary node archives WAL locally. [all servers]# su - postgres [all servers]$ mkdir /var/lib/pgsql/archivedir
//Then we edit the configuration file $PGDATA/postgresql.conf on server1 (primary) all servers as follows. Enable wal_log_hints to use pg_rewind. Since the Primary may become a Standby later, we set hot_standby = on.
listen_addresses = ‘*’ archive_mode = on archive_command = ‘cp “%p” “/var/lib/pgsql/archivedir/%f”’ max_wal_senders = 10 max_replication_slots = 10 wal_level = replica hot_standby = on wal_log_hints = on
We use the online recovery functionality of Pgpool-II to setup standby server after the primary server is started.
##### 2.2.1.4 create users
Because of the security reasons, we create a user repl solely used for replication purpose, and a user pgpool for streaming replication delay check and health check of Pgpool-II.
[server1]# psql -U postgres -p 5432 postgres=# SET password_encryption = ‘scram-sha-256’; postgres=# CREATE ROLE pgpool WITH LOGIN; postgres=# CREATE ROLE repl WITH REPLICATION LOGIN; postgres=# \password pgpool postgres=# \password repl postgres=# \password postgres
//If you want to show “replication_state” and “replication_sync_state” column in SHOW POOL NODES command result, role pgpool needs to be PostgreSQL super user or or in pg_monitor group (Pgpool-II 4.1 or later). Grant pg_monitor to pgpool: GRANT pg_monitor TO pgpool;
##### 2.2.1.5 enable scram-sha-256 authentication method
Assuming that all the Pgpool-II servers and the PostgreSQL servers are in the same subnet and edit pg_hba.conf to enable scram-sha-256 authentication method.
vi /lyhistory/workspace/postgres/data/pg_hba.conf host all all samenet scram-sha-256 host replication all samenet scram-sha-256
##### 2.2.1.6 passwordless SSH
To use the automated failover and online recovery of Pgpool-II, the settings that allow passwordless SSH to all backend servers between Pgpool-II execution user (default root user) and postgres user and between postgres user and postgres user are necessary. Execute the following command on all servers to set up passwordless SSH. The generated key file name is id_rsa_pgpool.
[all servers]# cd ~/.ssh [all servers]# ssh-keygen -t rsa -f id_rsa_pgpool [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server1 [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server2 [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server3
[all servers]# su - postgres [all servers]$ cd ~/.ssh [all servers]$ ssh-keygen -t rsa -f id_rsa_pgpool [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server1 [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server2 [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server3
test: ssh postgres@serverX -i ~/.ssh/id_rsa_pgpool
##### 2.2.1.7 allow repl user without specifying password for streaming replication and online recovery
[all servers]# su - postgres
[all servers]$ vi /var/lib/pgsql/.pgpass
server1:5432:replication:repl:
##### 2.2.1.8 firewall for postgres
[all servers]# firewall-cmd –permanent –zone=public –add-service=postgresql [all servers]# firewall-cmd –reload
##### 2.2.1.9 Summary
| **Item** | **Value** | **Detail** |
| -------------------- | --------------------------- | ------------------------- |
| PostgreSQL Version | 12.7 | - |
| port | 5432 | - |
| $PGDATA | /lyhistory/workspace/postgres/data | - |
| Archive mode | on | /var/lib/pgsql/archivedir |
| Replication Slots | Enable ? | - |
| Start automatically | Disable | - |
#### 2.2.2 Setup Pgpool-II+Watchdog
##### 2.2.2.1 Install Pgpool-II
##### 2.2.2.2 Pgpool-II Configuration - Common Settings
$ cp -p /etc/pgpool-II/pgpool.conf.sample-stream /etc/pgpool-II/pgpool.conf
listen_addresses = ‘*’ #Specify replication delay check user and password sr_check_user = ‘pgpool’ sr_check_password = ‘’
#Enable health check so that Pgpool-II performs failover health_check_period = 5
health_check_timeout = 30
health_check_user = ‘pgpool’ health_check_password = ‘’
health_check_max_retries = 3
#Specify the PostgreSQL backend informations
backend_hostname0 = ‘server1’
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = ‘/var/lib/pgsql/11/data’
backend_flag0 = ‘ALLOW_TO_FAILOVER’
backend_hostname1 = ‘server2’ backend_port1 = 5432 backend_weight1 = 1 backend_data_directory1 = ‘/var/lib/pgsql/11/data’ backend_flag1 = ‘ALLOW_TO_FAILOVER’
backend_hostname2 = ‘server3’ backend_port2 = 5432 backend_weight2 = 1 backend_data_directory2 = ‘/var/lib/pgsql/11/data’ backend_flag2 = ‘ALLOW_TO_FAILOVER’
//To show “replication_state” and “replication_sync_state” column in SHOW POOL NODES command result, backend_application_name parameter is required backend_application_name0 = ‘server1’ … backend_application_name1 = ‘server2’ … backend_application_name2 = ‘server3’
##### 2.2.2.3 Pgpool-II Configuration - Failover configuration
failover_command = ‘/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S’ follow_master_command = ‘/etc/pgpool-II/follow_master.sh %d %h %p %D %m %M %H %P %r %R’
$ cp /etc/pgpool-II/failover.sh.sample /etc/pgpool-II/failover.sh $ vi /etc/pgpool-II/follow_master.sh.sample /etc/pgpool-II/follow_master.sh $ chmod +x /etc/pgpool-II/{failover.sh,follow_master.sh}
Note: modify path in the scripts if needed: PGHOME=/usr/pgsql-12
##### 2.2.2.4 Pgpool-II Configuration - Online Recovery Configurations
$vim /etc/pgpool-II/pgpool.conf
recovery_user = ‘postgres’
recovery_password = ‘’
recovery_1st_stage_command = ‘recovery_1st_stage’
[server1]# su - postgres [server1]$ vi /var/lib/pgsql/11/data/recovery_1st_stage [server1]$ vi /var/lib/pgsql/11/data/pgpool_remote_start [server1]$ chmod +x /var/lib/pgsql/11/data/{recovery_1st_stage,pgpool_remote_start}
In order to use the online recovery functionality, the functions of pgpool_recovery, pgpool_remote_start, pgpool_switch_xlog are required, so we need install pgpool_recovery on template1 of PostgreSQL server server1.
[server1]# su - postgres [server1]$ psql template1 -c “CREATE EXTENSION pgpool_recovery”
##### 2.2.2.5 Pgpool-II Configuration - Client Authentication Configuration
Because in the section Before Starting, we already set PostgreSQL authentication method to scram-sha-256, it is necessary to set a client authentication by Pgpool-II to connect to backend nodes
vim /etc/pgpool-II/pool_hba.conf enable_pool_hba = on host all pgpool 0.0.0.0/0 scram-sha-256 host all postgres 0.0.0.0/0 scram-sha-256
//The default password file name for authentication is pool_passwd. To use scram-sha-256 authentication, the decryption key to decrypt the passwords is required. We create the .pgpoolkey file in root user’s home directory. [all servers]# echo ‘some string’ > ~/.pgpoolkey [all servers]# chmod 600 ~/.pgpoolkey
//Execute command pg_enc -m -k /path/to/.pgpoolkey -u username -p to regist user name and AES encrypted password in file pool_passwd. [all servers]# pg_enc -m -k /root/.pgpoolkey -u pgpool -p db password: [pgpool user’s password] [all servers]# pg_enc -m -k /root/.pgpoolkey -u postgres -p db password: [postgres user’s passowrd]
pgpool:AESheq2ZMZjynddMWk5sKP/Rw== postgres:AESHs/pWL5rtXy2IwuzroHfqg==
##### 2.2.2.6 Pgpool-II Configuration - Watchdog Configuration
//Enable watchdog functionality on server1, server2, server3. use_watchdog = on //Specify virtual IP address that accepts connections from clients on server1, server2, server3 delegate_IP = ‘192.168.137.150’ //To bring up/down the virtual IP and send the ARP requests, we set if_up_cmd, if_down_cmd and arping_cmd. The network interface used in this example is “enp0s8”. Since root privilege is required to execute if_up/down_cmd or arping_cmd command, use setuid on these command or allow Pgpool-II startup user, postgres user (Pgpool-II 4.1 or later) to run sudo command without a password. If installed from RPM, the postgres user has been configured to run ip/arping via sudo without a password. if_up_cmd = ‘/usr/bin/sudo /sbin/ip addr add $IP$/24 dev enp0s8 label enp0s8:0’ if_down_cmd = ‘/usr/bin/sudo /sbin/ip addr del $IP$/24 dev enp0s8’ arping_cmd = ‘/usr/bin/sudo /usr/sbin/arping -U $IP$ -w 1 -I enp0s8’
//Set if_cmd_path and arping_path according to the command path. If if_up/down_cmd or arping_cmd starts with “/”, these parameters will be ignored. if_cmd_path = ‘/sbin’ arping_path = ‘/usr/sbin’
//Specify the hostname and port number of each Pgpool-II server. server1
wd_hostname = 'server1'
wd_port = 9000
server2
wd_hostname = 'server2'
wd_port = 9000
server3
wd_hostname = 'server3'
wd_port = 9000
//Specify the hostname, Pgpool-II port number, and watchdog port number of monitored Pgpool-II servers on each Pgpool-II server.
server1
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'server2'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'server3'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
server2
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'server1'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'server3'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
server3
# - Other pgpool Connection Settings -
other_pgpool_hostname0 = 'server1'
# Host name or IP address to connect to for other pgpool 0
# (change requires restart)
other_pgpool_port0 = 9999
# Port number for other pgpool 0
# (change requires restart)
other_wd_port0 = 9000
# Port number for other watchdog 0
# (change requires restart)
other_pgpool_hostname1 = 'server2'
other_pgpool_port1 = 9999
other_wd_port1 = 9000
//Specify the hostname and port number of destination for sending heartbeat signal on server1, server2, server3. server1
heartbeat_destination0 = 'server2'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = ''
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
heartbeat_destination1 = 'server3'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''
server2
heartbeat_destination0 = 'server1'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = ''
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
heartbeat_destination1 = 'server3'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''
server3
heartbeat_destination0 = 'server1'
# Host name or IP address of destination 0
# for sending heartbeat signal.
# (change requires restart)
heartbeat_destination_port0 = 9694
# Port number of destination 0 for sending
# heartbeat signal. Usually this is the
# same as wd_heartbeat_port.
# (change requires restart)
heartbeat_device0 = ''
# Name of NIC device (such like 'eth0')
# used for sending/receiving heartbeat
# signal to/from destination 0.
# This works only when this is not empty
# and pgpool has root privilege.
# (change requires restart)
heartbeat_destination1 = 'server2'
heartbeat_destination_port1 = 9694
heartbeat_device1 = ''
##### 2.2.2.7 Pgpool-II Configuration - /etc/sysconfig/pgpool Configuration
//If you want to ignore the pgpool_status file at startup of Pgpool-II, add “- D” to the start option OPTS to /etc/sysconfig/pgpool. [all servers]# vi /etc/sysconfig/pgpool … OPTS=” -D -n”
##### 2.2.2.8 Pgpool-II Configuration - Logging
log_destination = ‘syslog’ # Where to log # Valid values are combinations of stderr, # and syslog. Default to stderr.
syslog_facility = 'LOCAL1'
# Syslog local facility. Default to LOCAL0
[all servers]# mkdir /var/log/pgpool-II [all servers]# touch /var/log/pgpool-II/pgpool.log
[all servers]# vi /etc/rsyslog.conf … .info;mail.none;authpriv.none;cron.none;LOCAL1.none /var/log/messages LOCAL1. /var/log/pgpool-II/pgpool.log [all servers]# vi /etc/logrotate.d/syslog … /var/log/messages /var/log/pgpool-II/pgpool.log /var/log/secure
[all servers]# systemctl restart rsyslog
##### 2.2.2.9 Pgpool-II Configuration - PCP Command Configuration
//Since user authentication is required to use the PCP command, specify user name and md5 encrypted password in pcp.conf. Here we create the encrypted password for pgpool user, and add “username:encrypted password” in /etc/pgpool-II/pcp.conf.
[all servers]# echo ‘pgpool:’pg_md5 PCP passowrd
» /etc/pgpool-II/pcp.conf
##### 2.2.2.10 Pgpool-II Configuration - .pcppass
//Since follow_master_command script has to execute PCP command without entering the password, we create .pcppass in the home directory of Pgpool-II startup user (root user). If doesn’t have root access, repeated same step under postgres user home directory (/var/lib/pgsql), make sure the permission is 600.
[all servers]# echo ‘localhost:9898:pgpool:pgpool’ > ~/.pcppass [all servers]# chmod 600 ~/.pcppass
##### 2.2.2.11 firewall for Pgpool-II
[all servers]# firewall-cmd –permanent –zone=public –add-port=9999/tcp –add-port=9898/tcp –add-port=9000/tcp –add-port=9694/udp [all servers]# firewall-cmd –reload
##### 2.2.2.12 Summary
| **Item** | **Value** | **Detail** |
| --------------------- | --------------------------------------------------- | ---------------------------------------------------------- |
| Pgpool-II Version | 4.1.1 | - |
| port | 9999 | Pgpool-II accepts connections |
| 9898 | PCP process accepts connections | |
| 9000 | watchdog accepts connections | |
| 9694 | UDP port for receiving Watchdog's heartbeat signal | |
| Config file | /etc/pgpool-II/pgpool.conf | Pgpool-II config file |
| Pgpool-II start user | postgres (Pgpool-II 4.1 or later) | Pgpool-II 4.0 or before, the default startup user is root |
| Running mode | streaming replication mode | - |
| Watchdog | on | Life check method: heartbeat |
| Start automatically | Disable | - |
#### 2.2.3 Assign Permission and set env path
chown –R postgres:postgres /var/lib/pgsql/12,
vim /var/lib/pgsql/.bash_profile
PATH=$PATH:/usr/pgsql-12/bin
export PATH
#### 2.2.4 Starting/Stopping Pgpool-II
Before starting Pgpool-II, please start PostgreSQL servers first. Also, when stopping PostgreSQL, it is necessary to stop Pgpool-II first
//let’s start Pgpool-II on server1, server2, server3 # systemctl start pgpool.service
#### 2.2.5 Set up PostgreSQL standby server
First, we should set up PostgreSQL standby server by using Pgpool-II online recovery functionality. Ensure that recovery_1st_stage and pgpool_remote_start scripts used by pcp_recovery_node command are in database cluster directory of PostgreSQL primary server (server1).
Password:
pcp_recovery_node -- Command Successful
# pcp_recovery_node -h 192.168.137.150 -p 9898 -U pgpool -n 2
Password:
pcp_recovery_node -- Command Successful
//After executing pcp_recovery_node command, vertify that server2 and server3 are started as PostgreSQL standby server.
Password for user pgpool
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2019-08-06 11:13:17
1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2019-08-06 11:13:25
2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:14:20
(3 rows) ``` #### 2.2.6 Switching active/standby watchdog ``` //Confirm the watchdog status by using pcp_watchdog_info. The Pgpool-II server which is started first run as MASTER. # pcp_watchdog_info -h 192.168.137.150 -p 9898 -U pgpool
Password:
3 YES server1:9999 Linux server1 server1
server1:9999 Linux server1 server1 9999 9000 4 MASTER #The Pgpool-II server started first becames "MASTER".
server2:9999 Linux server2 server2 9999 9000 7 STANDBY #run as standby
server3:9999 Linux server3 server3 9999 9000 7 STANDBY #run as standby
//Stop active server server1, then server2 or server3 will be promoted to active server. To stop server1, we can stop Pgpool-II service or shutdown the whole system. Here, we stop Pgpool-II service. [server1]# systemctl stop pgpool.service
# pcp_watchdog_info -p 9898 -h 192.168.137.150 -U pgpool
Password:
3 YES server2:9999 Linux server2 server2
server2:9999 Linux server2 server2 9999 9000 4 MASTER #server2 is promoted to MASTER
server1:9999 Linux server1 server1 9999 9000 10 SHUTDOWN #server1 is stopped
server3:9999 Linux server3 server3 9999 9000 7 STANDBY #server3 runs as STANDBY //Start Pgpool-II (server1) which we have stopped again, and vertify that server1 runs as a standby. [server1]# systemctl start pgpool.service
[server1]# pcp_watchdog_info -p 9898 -h 192.168.137.150 -U pgpool
Password:
3 YES server2:9999 Linux server2 server2
server2:9999 Linux server2 server2 9999 9000 4 MASTER
server1:9999 Linux server1 server1 9999 9000 7 STANDBY
server3:9999 Linux server3 server3 9999 9000 7 STANDBY
#### 2.2.7 test Failover
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/tutorial-testing-failover.html
//First, use psql to connect to PostgreSQL via virtual IP, and verify the backend informations.
Password for user pgpool:
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2019-08-06 11:13:17
1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2019-08-06 11:13:25
2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:14:20
(3 rows)
//Next, stop primary PostgreSQL server server1, and verify automatic failover.
[server1]$ pg_ctl -D /var/lib/pgsql/11/data -m immediate stop //After stopping PostgreSQL on server1, failover occurs and PostgreSQL on server2 becomes new primary DB.
# psql -h 192.168.137.150 -p 9999 -U pgpool postgres -c "show pool_nodes"
Password for user pgpool:
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | server1 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2019-08-06 11:36:03
1 | server2 | 5432 | up | 0.333333 | primary | 0 | true | 0 | | | 2019-08-06 11:36:03
2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:36:15
(3 rows) //server3 is running as standby of new primary server2.
[server3]# psql -h server3 -p 5432 -U pgpool postgres -c "select pg_is_in_recovery()"
pg_is_in_recovery
-------------------
t
[server2]# psql -h server2 -p 5432 -U pgpool postgres -c "select pg_is_in_recovery()"
pg_is_in_recovery
-------------------
f
[server2]# psql -h server2 -p 5432 -U pgpool postgres -c "select * from pg_stat_replication" -x
-[ RECORD 1 ]----+------------------------------
pid | 11059
usesysid | 16392
usename | repl
application_name | server3
client_addr | 192.168.137.103
client_hostname |
client_port | 48694
backend_start | 2019-08-06 11:36:07.479161+09
backend_xmin |
state | streaming
sent_lsn | 0/75000148
write_lsn | 0/75000148
flush_lsn | 0/75000148
replay_lsn | 0/75000148
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2019-08-06 11:42:59.823961+09 ```
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/tutorial-testing-online-recovery.html
//Here, we use Pgpool-II online recovery functionality to restore server1 (old primary server) as a standby. Before restoring the old primary server, please ensure that recovery_1st_stage and pgpool_remote_start scripts exist in database cluster directory of current primary server server2.
# pcp_recovery_node -h 192.168.137.150 -p 9898 -U pgpool -n 0
Password:
pcp_recovery_node -- Command Successful
//Then verify that server1 is started as a standby.
# psql -h 192.168.137.150 -p 9999 -U pgpool postgres -c "show pool_nodes"
Password for user pgpool:
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0 | server1 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2019-08-06 11:48:05
1 | server2 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2019-08-06 11:36:03
2 | server3 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2019-08-06 11:36:15
(3 rows)
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/tutorial-testing-replication.html $ pgbench -i -p 11000 test
https://www.pgpool.net/docs/pgpool-II-4.1.0/en/html/tutorial-testing-load-balance.html
0. backup db
pgdumpall
1. Stop all the postgres
su - postgres
pg_ctl -D /lyhistory/workspace/postgres/data -m immediate stop
2. Stop all the pgpool services
CHANGE TO ROOT USER:
systemctl stop pgpool.service
3. clean up replication servers
[replication servers only] rename or delete the postgres data directory (/lyhistory/workspace/postgres/data)
4. start primary postgres server and clean up replication slots on primary server
[primary server only] start postgres: pg_ctl -D /lyhistory/workspace/postgres/data -m immediate start
remove all the replication slots
su - postgres
psql
select * from pg_replication_slots;
select pg_drop_replication_slot('hostname') # replace the replication hostname
select * from pg_activity; (terminate active connection 查看是否有活跃的连接)
drop db;
5. init db on replication servers
[replication servers only] init db, and start postgres
# as root
/usr/pgsql-12/bin/postgresql-12-setup initdb
6. bring up replication postgres(can skip)
systemctl start postgres-12.service
or
su - postgres
pg_ctl -D /lyhistory/workspace/postgres/data -m immediate start
实验发现,不管是使用systemctl还是pg_ctl启动,最终都会变成pg_ctl管理,原因是,下一步 pcp_recovery_node
都会调用pg_ctl启动replicaiton服务器上的postgres,所以这一步貌似是多余的
7. start pgpool
[all servers] start pgpool
systemctl start pgpool.service
recover nodes from 1 → 5
pcp_recovery_node -h 192.168.137.150 -p 9898 -U pgpool -n 1
8 import db:
psql# create database "dbname"
gzip -dc /tmp/testdb.sql.gz | psql -U postgres -d "dbname"
Pgpool cluster (master standby) pcp 免密访问 localhost:9898 /var/lib/pgsql/.pcppass
pcp_watchdog_info -p 9898 -h <Virtual IP> -U pgpool
pcp_recovery_node -h 192.168.137.150 -p 9898 -U pgpool -n <node id>
psql -h <VirtualIP> -p 9999 -U pgpool postgres -c "show pool_nodes"
pg_ctl -D /var/lib/pgsql/12/data -m immediate stop
psql -h server3 -p 5432 -U pgpool postgres -c "select pg_is_in_recovery()"
psql -h server2 -p 5432 -U pgpool postgres -c "select * from pg_stat_replication" -x
pcp_recovery_node
ERROR: executing recovery, execution of command failed at "1st stage"
DETAIL: command:"recovery_1st_stage"
vi /lyhistory/workspace/postgres/data/log/postgresql-Mon.log
2022-06-27 12:00:21.267 +08 [9557] LOG: database system is ready to accept connections
sh: /lyhistory/workspace/postgres/data/recovery_1st_stage: /bin/bash^M: bad interpreter: No such file or directory
2022-06-27 12:02:53.154 +08 [9825] ERROR: pgpool_recovery failed
2022-06-27 12:02:53.154 +08 [9825] STATEMENT: SELECT pgpool_recovery('recovery_1st_stage', 'vm_v02', '/lyhistory/workspace/postgres/data', '5432', 1, '5432')
sh: /lyhistory/workspace/postgres/data/recovery_1st_stage: /bin/bash^M: bad interpreter: No such file or directory
dos2unix recovery_1st_stage
hostname invalid,只能包含下划线小写字母和数字
ERROR: replication slot name “vm_v02.novalocal” contains invalid character HINT: Replication slot names may only contain lower case letters, numbers, and the underscore character.
不断打印 ‘Password:’
调查:
SELECT pgpool_recovery(‘recovery_1st_stage’, ‘vm_v02’, ‘/lyhistory/workspace/postgres/data’, ‘5432’, 1, ‘5432’)
PRIMARY_NODE_PGDATA="$1"
DEST_NODE_HOST="$2" vm_v02
DEST_NODE_PGDATA="$3" /lyhistory/workspace/postgres/data
PRIMARY_NODE_PORT="$4" 5432
DEST_NODE_ID="$5" 1
DEST_NODE_PORT="$6" 5432
PRIMARY_NODE_HOST=$(hostname) vm_v01
PGHOME=/usr/pgsql-11
ARCHIVEDIR=/var/lib/pgsql/archivedir
REPLUSER=repl
..............................
## Execute pg_basebackup to recovery Standby node
ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
set -o errexit
rm -rf $DEST_NODE_PGDATA
rm -rf $ARCHIVEDIR/*
${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream
if [ ${PGVERSION} -ge 12 ]; then
sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
-e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf
fi
cat > ${RECOVERYCONF} << EOT
primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
recovery_target_timeline = 'latest'
restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'
primary_slot_name = '${DEST_NODE_HOST}'
EOT
if [ ${PGVERSION} -ge 12 ]; then
touch ${DEST_NODE_PGDATA}/standby.signal
else
echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
fi
sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf
"
if [ $? -ne 0 ]; then
${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
SELECT pg_drop_replication_slot('${DEST_NODE_HOST}');
EOQ
logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed
exit 1
fi
定位到 ${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream 按照日志中的参数在pcp_recovery_node n 1 即node 1(第二台机器)上手动执行: su - postgres /usr/pgsql-12/bin/pg_basebackup -h vm_v01.novalocal -U repl -p 5432 -D /lyhistory/workspace/postgres/data -X stream 发现要提供密码,而实际上我们配置了 /var/lib/pgsql/.pgpass, 原因就在于配置的.pgpass 的hostname不是vm_v01.novalocal,而是vm_v01
recovery_1st_stage脚本是从外部copy进来,可能是换行符的原因,其中的一些sed指令不识别参数 解决方法,使用/etc/pgpool-II/recovery_1st_stage.sample
奇怪,再试一次就可以成功 just retry one more time
vm1 master vm2&vm3 是slave 杀掉vm1的postgres,vm2成为master,但是vm3也down掉(但是5432仍然在监听,systemctl status是dead状态)
https://stackoverflow.com/questions/72259918/pgpool-failover