Pages

Monday, May 16, 2022

Stop Cluster, unmount and remount oracle shared storage, Start Cluster .

===========
General
===========
In short:
Stop Cluster
unmount  Oracle shared storage
Do DBA work
remount Oracle shared storage
Start Cluster.

How to know if this is a Veritas or Pacemaker?
getaclu
VCS - > Veritas

Starhome Technote
http://10.135.10.64/portal/projects/howto/mount-cl-fs-locally.html

=====================
Pacemaker
=====================

=============
Stop the cluster 
==============
Stop the cluster

#> pcs cluster stop --all

Tag and enable 
#> for i in $(vgscan |grep Ora | cut -d '"' -f 2) ;do vgchange --addtag sometag $i ; vgchange -ay --config 'activation{volume_list=["@sometag"]}' $i ; done

Mount locally
#> mount -t xfs -f -b size=4096 -m crc=0 /dev/OraVg1/db1 /oracle_db/db1
#> mount -t xfs -f -b size=4096 -m crc=0  /dev/OraVg2/Ora_Exp /backup/ora_exp
#> mount -t xfs -f -b size=4096 -m crc=0  /dev/OraVg2/Ora_Online /backup/ora_online
#> mount -t xfs -f -b size=4096 -m crc=0  /dev/OraVg3/db2 /oracle_db/db2

==============
Start the cluster 
==============
Unmount it all locally:
#> umount -f /mnt/oratmp /oracle_db/db1 /backup/ora_exp /backup/ora_online /oracle_db/db2

Delete tag and deactivate 

#> for i in $(vgscan |grep Ora | cut -d '"' -f 2) ;do vgchange -an $i ; vgchange --deltag sometag $i ; done

Restart the cluster 
#> pcs cluster start --all


=====================
VCS
=====================
Take screenshot of current status
df -hP | grep ora
/dev/vx/dsk/OraDg1/db1        79G   26G   54G  33% /oracle_db/db1
/dev/vx/dsk/OraDg2/Ora_Online 159G   16G  143G 10% /backup/ora_online
/dev/vx/dsk/OraDg2/Ora_Exp    100G  7.0G   93G  8% /backup/ora_exp
/dev/vx/dsk/OraDg3/db2        199G  662M  197G  1% /oracle_db/db2

Stop Cluster
hastop -all 
or
hastop -all -force

Perioticly check cluster status until it is unavailable - like the following message:
hastatus -summary
VCS ERROR V-16-1-10600 Cannot connect to VCS engine
VCS WARNING V-16-1-11046 Local system not available

or freeze just the oracle group
hagrp -freez ora_igt_sg

Mount locally the oracle mount points
vxdg import OraDg1
vxdg import OraDg2
vxdg import OraDg3

vxvol -g OraDg1 startall
vxvol -g OraDg2 startall
vxvol -g OraDg3 startall

mount -t vxfs /dev/vx/dsk/OraDg1/db1        /oracle_db/db1
mount -t vxfs /dev/vx/dsk/OraDg2/Ora_Online /backup/ora_online
mount -t vxfs /dev/vx/dsk/OraDg2/Ora_Exp    /backup/ora_exp
mount -t vxfs /dev/vx/dsk/OraDg3/db2        /oracle_db/db2

df -hP | grep ora

Do the Oracle stuff

Dismount locally the oracle mount points

umount /oracle_db/db1 
umount /backup/ora_online
umount /backup/ora_exp
umount /oracle_db/db2

vxvol -g OraDg1 stopall
vxvol -g OraDg2 stopall
vxvol -g OraDg3 stopall

vxdg deport OraDg1
vxdg deport OraDg2
vxdg deport OraDg3

Start the service
Run the following command on ALL cluster nodes:
node a
hastart

node b
hastart


===========
Additional Commands
===========
move service to another node
as root
pcs status
pcs resource cleanup ora_igt_rg
pcs resource disable ora_igt_rg
pcs resource enable ora_igt_rg
pcs resouce show ora_igt_rg

To create a service
as root Check on oracle service definition
/etc/sysconfig/env.oracledb 

for oracle service - check this file:
/etc/systemd/system/dbora.service
This defines the oracle service, ORACLE_HOME, ORACLE_SID
/etc/sysconfig/evn.oracledb

run this:
systemctl status dbora
systemctl status dbora.service
systemctl start dbora.service
systemctl deamon
pcs resource create dbora_igt_ap systemd:dbora op stop interval=0 timeout=120s on-fail="block" monitor interval=30s timeout=600s start interval=0 timeout=120s --group ora_igt_rg

pacemaker cluster commands info
pcs status - this will show cluster status
pcs resource show - this will show current node status
pcs resource show dbora_igt_ap - this will show oracle service info
 Resource: dbora_igt_ap (class=systemd type=dbora)
  Operations: monitor interval=30s timeout=600s (dbora_igt_ap-monitor-interval-30s)
              start interval=0 timeout=120s (dbora_igt_ap-start-interval-0)
              stop interval=0 on-fail=block timeout=120s (dbora_igt_ap-stop-interval-0)

pacemaker cluster commands oracle
pcs resource move ora_igt_rg PIPNVHED901G
pcs resource cleanup ora_igt_rg 
pcs resource disable ora_igt_rg
pcs resource enable ora_igt_rg

pacemaker cluster commands restart oracle service
pcs status
pcs resource cleanup ora_igt_rg
pcs resource disable ora_igt_rg
pcs resource enable ora_igt_rg
pcs resource restart ora_igt_rg PIPNVHED901G
pcs status

pacemaker cluster commands other

--pcs resource disable oracle
--pcs resource enable oracle/19/dbhome_1/rdbms/log/startup



Check corosync in short
root>% vi /etc/corosync/corosync.conf 
root>% pcs cluster sync
server901G: Succeeded
server902G: Succeeded
root>% pcs cluster reload corosync
Corosync reloaded
root>% corosync-cmapctl | grep totem.token
runtime.config.totem.token (u32) = 5000
runtime.config.totem.token_retransmit (u32) = 1190
runtime.config.totem.token_retransmits_before_loss_const (u32) = 4
totem.token (u32) = 5000

Change corosync timeout
1. Edit /etc/corosync/corosync.conf on one of the cluster nodes
Add the required line if does not exist or update the value if the line does exist.
For 5 seconds, set the value to 5000 msec.

totem {
   version: 2
   secauth: off
   cluster_name: rhel7-cluster
   transport: udpu
   rrp_mode: passive
   token: 15000      <--- If this line is missing, add it, otherwise update the value.-->
}

2. Propagate the updated corosync.conf to the rest of the nodes as follows:
pcs cluster sync

3. Reload corosync.
    This command can be run from one node to reload corosync on all nodes and does not require a downtime.
pcs cluster reload corosync

4. Confirm changes
corosync-cmapctl | grep totem.token

For example:
corosync-cmapctl | grep totem.token
runtime.config.totem.token (u32) = 5000
runtime.config.totem.token_retransmit (u32) = 1190
runtime.config.totem.token_retransmits_before_loss_const (u32) = 4

No comments:

Post a Comment