티스토리 뷰

Linux

centos7 HA (heartbeat) coresync, pcs, peacemaker

CHOMAN 2018. 5. 15. 13:30

centos7 HA (heartbeat)

CENTOS7 에서는 기존까지 사용하던 Heartbeat 패키지가 보이지 않아서 아래와 같이 HA를 구성해보았다.

공식사이트

https://clusterlabs.org/

참고한 원문 사이트

https://blog.boxcorea.com/wp/archives/1784
https://www.server-world.info/en/note?os=CentOS_8&p=pacemaker&f=1
https://serverfault.com/questions/783689/pacemaker-how-to-configure-default-gateway-for-virtual-ip
https://serverfault.com/questions/996750/how-to-make-clone-resource-start-after-a-specific-resource-starts-in-pacemaker
https://yoanp.github.io/blog/pcs%EB%A5%BC-%EC%9D%B4%EC%9A%A9%ED%95%9C-ha%EA%B5%AC%EC%84%B1/

cat /etc/hosts

123.456.789.140 WEB1
123.456.789.141 WEB2
123.456.789.144 WEBM

설치 (centos7)

yum install -y pacemaker corosync pcs psmisc policycoreutils-python

데몬 실행

systemctl start pcsd.service
systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.

설치 (centos8)

dnf --enablerepo=HighAvailability -y install pacemaker pcs 
systemctl enable --now pcsd

dnf --enablerepo=ha -y install pacemaker pcs

hacluster 계정 패스워드 설정 (노드별)

[root@localhost ~]# passwd hacluster
hacluster 사용자의 비밀 번호 변경 중
새  암호:
새  암호 재입력:
passwd: 모든 인증 토큰이 성공적으로 업데이트 되었습니다.

방화벽 설정

firewall-cmd --permanent --zone=public --add-port=2224/tcp
firewall-cmd --permanent --zone=public --add-port=5405/udp
firewall-cmd --reload

방화벽 설정 (2)

firewall-cmd --add-service=high-availability --permanent
firewall-cmd --reload

coresync 설정

[root@localhost ~]# pcs cluster auth WEB1 WEB2
Username: hacluster
Password: 
web2: Authorized
web1: Authorized

coresync 설정 (centos8)

[root@image1 ~]# pcs host auth image1 image2
Username: hacluster
Password:
image1: Authorized
image2: Authorized

명칭 (name) 대 소문자 구분함

coresync 동기화

[root@localhost ~]# pcs cluster setup --name web_cluster web1 web2
Destroying cluster on nodes: web1, web2...
web1: Stopping Cluster (pacemaker)...
web2: Stopping Cluster (pacemaker)...
web1: Successfully destroyed cluster
web2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'web1', 'web2'
web1: successful distribution of the file 'pacemaker_remote authkey'
web2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
web1: Succeeded
web2: Succeeded

Synchronizing pcsd certificates on nodes web1, web2...
web2: Success
web1: Success
Restarting pcsd on the nodes in order to reload the certificates...
web2: Success
web1: Success

coresync 동기화 (centos8)

[root@image1 ~]# pcs cluster setup image_cluster image1 image2
No addresses specified for host 'image1', using 'image1'
No addresses specified for host 'image2', using 'image2'
Destroying cluster on hosts: 'image1', 'image2'...
image1: Successfully destroyed cluster
image2: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'image1', 'image2'
image1: successful removal of the file 'pcsd settings'
image2: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'image1', 'image2'
image1: successful distribution of the file 'corosync authkey'
image1: successful distribution of the file 'pacemaker authkey'
image2: successful distribution of the file 'corosync authkey'
image2: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'image1', 'image2'
image1: successful distribution of the file 'corosync.conf'
image2: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.

sync 마추기

[root@localhost corosync]# pcs cluster sync
web1: Succeeded
web2: Succeeded

클러스터 실행 및 상태 확인

[root@localhost ~]# pcs cluster start --all
web2: Starting Cluster...
web1: Starting Cluster...

클러스터 통신 체크

[root@localhost ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
    id    = 123.456.789.140
    status    = ring 0 active with no faults

[root@localhost ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 2
RING ID 0
    id    = 123.456.789.141
    status    = ring 0 active with no faults

멤버쉽 쿼럼 확인

[root@localhost ~]# corosync-cmapctl | egrep -i members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(123.456.789.140) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(123.456.789.141) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

pcs status corosync

Membership information
----------------------
    Nodeid      Votes Name
         1          1 web1 (local)
         2          1 web2

[root@localhost ~]# crm_verify -L -V

   error: unpack_resources:    Resource start-up disabled since no STONITH resources have been defined
   error: unpack_resources:    Either configure some or disable STONITH with the stonith-enabled option
   error: unpack_resources:    NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid

STONITH 관련 에러 발생 : 데이터의 무결성을 확보하기 위한 옵션?

STONITH 설정 해제

[root@localhost ~]# pcs property set stonith-enabled=false
[root@localhost ~]# crm_verify -L -V

가상 IP 리소스 생성

pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=아이피 cidr_netmask=24 op monitor interval=30s

가상 IP 리소스 생성 (nic 지정)

pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=아이피 nic=eth0:0 or em1:0 cidr_netmask=24 op monitor interval=30s

nic 지정 이유는 간혹 다른 네트워크의 아이피를 사용하는 경우 Error 발생함

가상 IP 리소스 변경

pcs resource update VirtualIP ip=변경할가상아이피

리소스 삭제

pcs resource delete <id>

가상 아이피 강제로 올리기

pcs resource debug-start VirtualIP
(상황종료후)
pcs resource debug-stop VirtualIP

리소스가 실행되지 않을때 긴급히 실행하면 효율적일듯 함

route 자원으로 설정

# virtual ip 생성
pcs resource create VirtualIP ocf:heartbeat:IPaddr2 ip=VIP nic=em1:0 cidr_netmask=24 op monitor interval=30s

# route 설정 (default gateway 설정)
pcs resource create Defaultgw ocf:heartbeat:Route destination="default" device="em1" gateway="게이트웨이아이피" family="ip4"

# Virtual IP 가 먼저 실행되고 나서 게이트웨이 자원 실행
pcs constraint order VirtualIP then Defaultgw

# DefaultGW 는 VirtualIP 와 실행되어야 함을 의미
pcs constraint colocation add Defaultgw with VirtualIP INFINITY

constraint 강제하다라는 의미

스크립트 자원으로 설정

init 스크립트 형식으로 생성후 /etc/init.d/ 디렉토리로 복사
or 심볼릭 링크로 스크립트 연결도 가능함

pcs resource create uagent lsb:uagent op monitor interval=30s
pcs constraint order Defaultgw then uagent
pcs constraint colocation add Defaultgw with uagent INFINITY 

*uagent 는 스크립트 파일

resource constraints 정보 확인

pcs constraint --full
Location Constraints:
Ordering Constraints:
  start VirtualIP then start Defaultgw (kind:Mandatory) (id:order-VirtualIP-Defaultgw-mandatory)
Colocation Constraints:
  Defaultgw with VirtualIP (score:INFINITY) (id:colocation-Defaultgw-VirtualIP-INFINITY)
Ticket Constraints:

constraints 삭제

pcs constraint remove <ID>

ocf:heartbeat:IPaddr2
ocf : resource standard
heartbeat : 표준
IPaddr2 : 리소스 스크립트 이름

리소스 스탠다드 확인 (첫번째 필드)

[root@localhost ~]# pcs resource standards
lsb : /etc/init.d/ 에 위치하는 스크립트
ocf : 기본 제공하는 LBS 스크립트 /lib/ocf/resource.d/heartbeat
service
systemd

뜻?

ocf – Open cluster Framework
lsb – Linux standard base (보통 init scripts)
service – Based on Linux “service” command.
systemd – systemd based service Management
stonith – Fencing Resource standard.

lsb 리눅스 표준

http://refspecs.linux-foundation.org/LSB_3.2.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

리소스 프로바이더 확인 (2번째 필드)

[root@localhost ~]# pcs resource providers
heartbeat
openstack
pacemaker

리소스 스크립트 확인 (세번째 필드)

[root@localhost ~]# pcs resource agents ocf:heartbeat

테스트

클러스터 데몬 죽이기

pcs cluster 노드이름 stop

클러스터 스탠바이 상태 만들기

pcs cluster standby "클러스터이름"
pcs cluster unstandby "클러스터이름"

설정 확인

/var/lib/pacemaker/cib/cib.xml
/etc/corosync/corosync.conf
pcs config show

직정 수정하지말고 pcs 명령어로 실행!!

로그 확인

/var/log/cluster/corosync.log
/var/log/pcsd/pcsd.log
/var/log/pacemaker.log

journalctl -u pacemaker.service

모니터링

crm_mon = watch pcs status

pcs status
pcs cluster status

Error: cluster is not currently running on this node (서버의 자신의 노드가 클러스터 중단된 상태)

쿼럼

What is Quorum?
A cluster has quorum when more than half of the nodes are online. Pacemaker’s default behavior is to stop all resources if the cluster does not have quorum. However, this does not make sense in a two-node cluster; the cluster will lose quorum if one node fails.

Error: Stopping the node(s) will cause a loss of the quorum, use --force to override <- 이 메세지를 본다면 쿼럼 enable 상태임

쿼럼 disable

pcs property set no-quorum-policy=ignore
crm configure property  no-quorum-policy=ignore

클러스터 노드가 단 2대인 경우 필수 설정임!!!

노드 추가 / 삭제

pcs cluster node add node
pcs cluster node remove node

기존 node 를 재설치하는 경우 remove 후 다시 add 하면 됨

저작자표시 (새창열림)

'Linux' 카테고리의 다른 글

centos 5 버전 다운로드 (0)	2018.08.22
sysbench (0)	2018.05.16
mod_status (아파치 상태 모니터링) (0)	2018.05.08
mod-auth-token (OTP URL) (0)	2018.04.24
iptables-extensions (0)	2018.04.02

공유하기 링크

페이스북
카카오스토리
트위터

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

글 보관함

KENSEI IT BLOG