master redis cluster

4 minute read

I prefer to list down all the commands for later reference, most of us are not good at memorize such boring cmds, it’s always good to prepare cheatsheet

redis-server conf/redis.conf
redis-cli -c -h HOSTIP -p PORT
	cluster info
	cluster nodes
	replicate NODEID
	cluster failover (on slave node)
    keys * (Check empty)
    flushdb/flushall (on master nodes)
    cluster reset(soft/hard, chang slave to an empty ‘standalone’ master)
    Slaveof no one (change slave to master, cannot execute in cluster mode)
	Slave host port (change slave to replicate another master, cannot execute in cluster mode)
redis-cli --cluster check
redis-cli --cluster fix
redis-cli -h HOSTIP -p PORT cluster meet TARGETHOSTIP TARGETPORT
redis-cli cluster forget NODEID (cannot perform on itself or it’s slave)
redis-cli --cluster del-node HOSTIP:PORT NODEID
#add-node默认是empty master,也可以加参数指定为slave ()
redis-cli --cluster add-node --cluster-slave --cluster-master-id NODEID

redis-cli -p 7002 shutdown
redis-cli -p 7002 debug segfault

redis-cli --cluster reshard ANYHOSTIP:ANYPORT --cluster-yes
redis-cli --cluster rebalance --cluster-threshold 1 ANYHOSTIP:ANYPORT

I’ve been using redis for quite sometime, but only limited on redis client using off the shell library, never touched the server side, here is the story to start: some days ago my collegue asked me how to flush one of the nodes, so I told him ‘flushall then cluster reset’, and that resulted in a isolated master node which forget all the other nodes, but from the other nodes still have a record of this reset node, it turns out that I actually did the same thing before creating cluster, so nothing influnced on my machine while my colleague already created the cluster, this ‘accident’ drive me to look through the redis document line by line to have a better understanding of how this behavior happen and how to resolve it by re-joining it to the cluster, here is some points I would like to point out for maintaining the cluster:

  1. TCP PORTS beaware of the TCP PORTS connection between nodes: Every Redis Cluster node requires two TCP connections open. The normal Redis TCP port used to serve clients, for example 6379, plus the port obtained by adding 10000 to the data port, so 16379 in the example, 16379 is served for the gossip protocol between nodes; make sure the the firewall not blocking both ports

  2. Data Sharding and Rebalance redis use bit map to indicate whether one slot has been used, for reducing the data size when exchange message between nodes,the bit map is fixed to 2048 bytes: 2048 bytes = 2^11 * 8 bit= 2^14 bit= 16384, so 2048 bytes can represent 16384 slots crc16 algorithm will calculate hash out of redis key:hash_slot = CRC16(key) % 16384; it results in different keys allocated into ‘random’ slots, slots allocated accross cluster, thus the reason why it’s not allowed to perform fuzz/pattern search on cluster. but there is a way to do mutlti key operation, trick is to use Hash tag, for example: this{foo}key and another{foo}key are guaranteed to be in the same hash slot, and can be used together in a command with multiple keys as arguments

  3. Consistency Guarantee / CAP Theory: Redis Cluster is not able to guarantee strong consistency. In practical terms this means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client. Tradeoff concern between Synchronous write and Performance

  4. Replicas migration how it works is decribed very clear in the config file:
    # Cluster replicas are able to migrate to orphaned masters, that are masters
    # that are left without working replicas. This improves the cluster ability
    # to resist to failures as otherwise an orphaned master can't be failed over
    # in case of failure if it has no working replicas.
    # Replicas migrate to orphaned masters only if there are still at least a
    # given number of other working replicas for their old master. This number
    # is the "migration barrier". A migration barrier of 1 means that a replica
    # will migrate only if there is at least 1 other working replica for its master
    # and so forth. It usually reflects the number of replicas you want for every
    # master in your cluster.
    # Default is 1 (replicas migrate only if their masters remain with at least
    # one replica). To disable migration just set it to a very large value.
    # A value of 0 can be set but is useful only for debugging and dangerous
    # in production.
    # cluster-migration-barrier 1
  5. Master Node whenever make any changes on master node, be careful about the slots allocated to it, you can manually reshard to another master node; or you can make a manaul failover to elect the slave to be the new master;

  6. cluster reset, hard or soft the differnece is hard reset will change the node id, so the other nodes won’t recognize the new node id, it will be fresh new node; while soft reset won’t change the node id, only make the node forget all the other nodes, but all the other nodes still remember the reseted node, in this case you can never add it back, but only can join back by meet with other nodes again; small tips: if the soft reset node was previously a master node, it’s slave node can never forget its master! but if it was a slave node, after soft reset, you can make other nodes forget it, so it can really disconnect with all other nodes;