Pages

Monday, April 25, 2022

Oracle Autonomous Health Framework (AHF) by Example

==========================================
Autonomous Health Framework (AHF) and tfactl 
==========================================
Autonomous Health Framework (AHF) can be installed as the "root" user on the server, 
which provides the most functionality and allows it to run in a proactive manner as a daemon. 
In this example installation is done as the root user.
AHP installs these utilities:
orachk -> /opt/oracle.ahf/orachk/orachk
oerr -> /opt/oracle.ahf/orachk/lib/oerr.sh
tfactl

tfacl can be run with various flags, to gather different info about OS, oracle installations, filesystem, etc.

Step 1 - Installation

Unzip the software and run the ahf_setup command. 
Answer the questions when prompted. 
The following must be run as root

as root:
mkdir /opt/ahf_data


root@my_server:/software/oracle/oracle/scripts/AHF>% ./ahf_setup

AHF Installer for Platform Linux Architecture x86_64

AHF Installation Log : /tmp/ahf_install_211400_23652_2021_07_08-15_23_08.log

Starting Autonomous Health Framework (AHF) Installation

AHF Version: 21.1.4 Build Date: 202106281226

Default AHF Location : /opt/oracle.ahf

Do you want to install AHF at [/opt/oracle.ahf] ? [Y]|N : y

AHF Location : /opt/oracle.ahf

AHF Data Directory stores diagnostic collections and metadata.
AHF Data Directory requires at least 5GB (Recommended 10GB) of free space.

Please Enter AHF Data Directory : /opt/ahf_data
Do you want to add AHF Notification Email IDs ? [Y]|N : n

Extracting AHF to /opt/oracle.ahf

Configuring TFA Services

Discovering Nodes and Oracle Resources

Successfully generated certificates.

Starting TFA Services
Created symlink from /etc/systemd/system/multi-user.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.
Created symlink from /etc/systemd/system/graphical.target.wants/oracle-tfa.service to /etc/systemd/system/oracle-tfa.service.

.-----------------------------------------------------------------------------------.
| Host           | Status of TFA | PID  | Port  | Version    | Build ID             |
+----------------+---------------+------+-------+------------+----------------------+
| my_server      | RUNNING       | 6388 | 11599 | 21.1.4.0.0 | 21140020210628122659 |
'----------------+---------------+------+-------+------------+----------------------'

Running TFA Inventory...

Adding default users to TFA Access list...

.-------------------------------------------------------------------.
|                     Summary of AHF Configuration                    |
+---------------+---------------------------------------------------+
| Parameter       | Value                                           |
+-----------------+-------------------------------------------------+
| AHF Location    | /opt/oracle.ahf                                 |
| TFA Location    | /opt/oracle.ahf/tfa                             |
| Orachk Location | /opt/oracle.ahf/orachk                          |
| Data Directory  | /opt/ahf_data/oracle.ahf/data                   |
| Repository      | /opt/ahf_data/oracle.ahf/data/repository        |
| Diag Directory  | /opt/ahf_data/oracle.ahf/data/qanfv-1-dbs-1b/diag
'-----------------+-------------------------------------------------'


Starting orachk scheduler from AHF ...

AHF binaries are available in /opt/oracle.ahf/bin

AHF is successfully installed

Do you want AHF to store your My Oracle Support Credentials for Automatic Upload ? Y|[N] : N

Moving /tmp/ahf_install_211400_3967_2021_07_08-15_28_16.log to /opt/ahf_data/oracle.ahf/data/qanfv-1-dbs-1b/diag/ahf/


This installs /opt/oracle.ahf/

root@my_server:/software/oracle/oracle/scripts/oracle_support>% cd /opt/oracle.ahf/bin/
root@my_server:/opt/oracle.ahf/bin>% ls -ltr
total 4
lrwxrwxrwx 1 root root   29 Mar 17 23:19 orachk -> /opt/oracle.ahf/orachk/orachk
lrwxrwxrwx 1 root root   34 Mar 17 23:19 oerr -> /opt/oracle.ahf/orachk/lib/oerr.sh
-rwxr-xr-x 1 root root 3818 Mar 17 23:19 tfactl


./orachk
Collections and audit checks log file is
/opt/oracle.ahf/data/qanfv-1-dbs-1a/orachk/user_root/output/orachk_qanfv-1-dbs-1a_igt_031621_152257/log/orachk.log





Step 2 - Execution - a general run

cd /opt/oracle.ahf/bin/
./orachk

============================================================
         Node name - qanfv-1-dbs-1a
============================================================
. . . . . .

 Collecting - Database Parameters for igt database
 Collecting - Database Undocumented Parameters for igt database
 Collecting - List of active logon and logoff triggers for igt database
 Collecting - CPU Information
 Collecting - Disk I/O Scheduler on Linux
 Collecting - DiskMount Information
 Collecting - Kernel parameters
 Collecting - Maximum number of semaphore sets on system
 Collecting - Maximum number of semaphores on system
 Collecting - Maximum number of semaphores per semaphore set
 Collecting - Memory Information
 Collecting - OS Packages
 Collecting - Operating system release information and kernel version
 Collecting - Patches for RDBMS Home
 Collecting - Table of file system defaults
 Collecting - number of semaphore operations per semop system call
 Collecting - Disk Information
 Collecting - ORAchk Daemon/Scheduler configuration
 Collecting - Root user limits
 Collecting - Verify TCP Selective Acknowledgement is enabled
 Collecting - Verify no database server kernel out of memory errors
 Collecting - Verify the vm.min_free_kbytes configuration

Data collections completed. Checking best practices on qanfv-1-dbs-1a.
------------------------------------------------------------


 WARNING =>  Linux swap configuration does not meet recommendation
 WARNING =>  Non-AWR Space consumption is greater than or equal to 50% of total SYSAUX space. for igt
 WARNING =>  There are some application objects with STALE statistics for igt
 INFO =>     Most recent ADR incidents for /software/oracle/122
 INFO =>     Oracle GoldenGate failure prevention best practices
 CRITICAL => The vm.min_free_kbytes configuration is not set as recommended
 INFO =>     Oracle GoldenGate Health-Checks and Diagnostics Reports for igt
 INFO =>     user_dump_dest has trace files older than 30 days for igt
 WARNING =>  ORA-00600 errors found in alert log for igt
 INFO =>     Alert log file is too big and should be rolled over periodically for igt
 INFO =>     At some times checkpoints are not being completed for igt
 WARNING =>  One or more redo log groups are not multiplexed for igt
 WARNING =>  Primary database is not protected with Data Guard (standby database) for real-time data protection and availability for igt
 FAIL =>     numa_balancing kernel parameter is not configured to 0
 INFO =>     Important Storage Minimum Requirements for Grid & Database Homes
 WARNING =>  OSWatcher is not running as is recommended.
 FAIL =>     Database parameter DB_LOST_WRITE_PROTECT is not set to recommended value on igt instance
 WARNING =>  Database parameter DB_BLOCK_CHECKING on primary is not set to the recommended value. for igt
 WARNING =>  Consider setting the value of the parameter _cursor_obsolete_threshold  to 1024 for Non-Multitenant environment which is the appropriate recommended value for igt
 INFO =>     Operational Best Practices
 INFO =>     Database Consolidation Best Practices
 INFO =>     Computer failure prevention best practices
 INFO =>     Data corruption prevention best practices
 INFO =>     Logical corruption prevention best practices
 INFO =>     Database/Cluster/Site failure prevention best practices
 INFO =>     Client failover operational best practices
 WARNING =>  Oracle patch 30712670 is not applied on RDBMS_HOME /software/oracle/122
 WARNING =>  Oracle patch 29867728 is not applied on RDBMS_HOME /software/oracle/122
 WARNING =>  Oracle patch 31142749 is not applied on RDBMS_HOME /software/oracle/122
 WARNING =>  Oracle patch 26749785 is not applied on RDBMS_HOME /software/oracle/122
 WARNING =>  Oracle patch 29302565 is not applied on RDBMS_HOME /software/oracle/122
 WARNING =>  Oracle patch 29259068 is not applied on RDBMS_HOME /software/oracle/122
 FAIL =>     RECYCLEBIN on PRIMARY should be set to the recommended value on igt instance
 WARNING =>  Oracle clusterware is not being used
 WARNING =>  RAC Application Cluster is not being used for database high availability on igt instance
 FAIL =>     Table AUD$[FGA_LOG$] should use Automatic Segment Space Management for igt
 WARNING =>  Flashback on PRIMARY is not configured for igt
 INFO =>     Database failure prevention best practices
 WARNING =>  fast_start_mttr_target has NOT been changed from default on igt instance
 FAIL =>     Active Data Guard is not configured for igt
 INFO =>     Parallel Execution Health-Checks and Diagnostics Reports for igt
 CRITICAL => The data files should be recoverable for igt
 WARNING =>  The UTL_SPADV package should be installed in the database. for igt
 WARNING =>  The Streams pool is not currently set or not sized appropriately for this database instance. for igt
 INFO =>     Oracle recovery manager(rman) best practices
 INFO =>     Database feature usage statistics for igt
 WARNING =>  Consider investigating changes to the schema objects such as DDLs or new object creation for igt
 WARNING =>  Consider adding more redo log groups or increase the size of redo logs for igt
Best Practice checking completed. Checking recommended patches on qanfv-1-dbs-1a
--------------------------------------------------------------------------------
Collecting patch inventory on ORACLE_HOME /software/oracle/122
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
1 Recommended RDBMS patches for 122010 from /software/oracle/122 on qanfv-1-dbs-1a
--------------------------------------------------------------------------------
Patch#   RDBMS    ASM     type                Patch-Description
--------------------------------------------------------------------------------
31741641 no               merge                Database Oct 2020 Release Update 12.2.0.1.201020
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
              RDBMS homes patches summary report
--------------------------------------------------------------------------------
Total patches  Applied on RDBMS Applied on ASM ORACLE_HOME
--------------------------------------------------------------------------------
1              1              0                /software/oracle/122
--------------------------------------------------------------------------------
------------------------------------------------------------
Detailed report (html) -  /opt/oracle.ahf/data/qanfv-1-dbs-1a/orachk/user_root/output/orachk_qanfv-1-dbs-1a_igt_031621_152257/orachk_qanfv-1-dbs-1a_igt_031621_152257.html




UPLOAD [if required] - /opt/oracle.ahf/data/qanfv-1-dbs-1a/orachk/user_root/output/orachk_qanfv-1-dbs-1a_igt_031621_152257.zip

Step 2 - Execution - gather info about corruption on disk
This MUST be run as oracle user
Need first to get database name

SELECT NAME FROM V$DATABASE;
NAME
---------
IGT

cd /opt/oracle.ahf/bin/
./tfactl diagcollect -srdc dbcorrupt 

oracle@my_server:~>% cd /opt/oracle.ahf/bin
oracle@my_server:/opt/oracle.ahf/bin>% ./tfactl diagcollect -srdc dbcorrupt
Enter the Database Name [Required for this SRDC] : IGT
Scripts to be run by this srdc: ipspack get_db_opatch_info srdc_corruption_1578_info.sql hcheck.sql hcheck_pdb.sql
Components included in this srdc: OS DATABASE ASM CHMOS
Collecting data for local node(s)

Collection Id : 20210316154324qanfv-1-dbs-1a

Detailed Logging at : /opt/oracle.ahf/data/repository/srdc_dbcorrupt_collection_Tue_Mar_16_15_43_25_GMT_2021_node_local/diagcollect_20210316154324_qanfv-1-dbs-1a.log
2021/03/16 15:43:30 GMT : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2021/03/16 15:43:30 GMT : Collection Name : tfa_srdc_dbcorrupt_Tue_Mar_16_15_43_25_GMT_2021.zip
2021/03/16 15:43:30 GMT : Collecting additional diagnostic information...
2021/03/16 15:43:30 GMT : Scanning of files for Collection in progress...
2021/03/16 15:43:40 GMT : Getting list of files satisfying time range [03/16/2021 14:43:30 GMT, 03/16/2021 15:43:40 GMT]
2021/03/16 15:43:54 GMT : Collecting ADR incident files...
2021/03/16 15:43:57 GMT : Completed collection of additional diagnostic information...
2021/03/16 15:44:00 GMT : Completed Local Collection
.------------------------------------------.
|            Collection Summary            |
+----------------+-----------+------+------+
| Host           | Status    | Size | Time |
+----------------+-----------+------+------+
| qanfv-1-dbs-1a | Completed | 8MB  |  30s |
'----------------+-----------+------+------'

Logs are being collected to: /opt/oracle.ahf/data/repository/srdc_dbcorrupt_collection_Tue_Mar_16_15_43_25_GMT_2021_node_local
/opt/oracle.ahf/data/repository/srdc_dbcorrupt_collection_Tue_Mar_16_15_43_25_GMT_2021_node_local/qanfv-1-dbs-1a.tfa_srdc_dbcorrupt_Tue_Mar_16_15_43_25_GMT_2021.zip


The Output Files:
are under 
qanfv-1-dbs-1a.tfa_srdc_dbcorrupt_Tue_Mar_16_15_43_25_GMT_2021.zip.txt
qanfv-1-dbs-1a.tfa_srdc_dbcorrupt_Tue_Mar_16_15_43_25_GMT_2021.zip
diagcollect_console_20210316154324_qanfv-1-dbs-1a.log
diagcollect_20210316154324_qanfv-1-dbs-1a.log

qanfv-1-dbs-1a.tfa_srdc_dbcorrupt_Tue_Mar_16_15_43_25_GMT_2021.zip - if the actual output file


Example B - gather info about corrupt UNDO tablespace
As oracle user run:

oracle@my_server:/opt/oracle.ahf/bin>% ./tfactl diagcollect -srdc DBAUM

SRDC diagnostic collections must be run as an oracle privileged user - not root
root@my_server:/opt/oracle.ahf/bin>% su - oracle
Last login: Thu Jul  8 15:36:43 GMT 2021
oracle@my_server:~>% oraigt
oracle@my_server:~>% cd /opt/oracle.ahf/bin/
oracle@my_server:/opt/oracle.ahf/bin>% ./tfactl diagcollect -srdc DBAUM
Enter the Database Name [Required for this SRDC] : IGT
Selected ORACLE_HOME /software/oracle/122
Enter the time of the issue [YYYY-MM-DD HH24:MI:SS,<RETURN>=ALL] :
Is there any ORA error happened on undo?[Y|N] [Required for this SRDC]: Y
Please input the ORA error number[number only] [Required for this SRDC]: 30013
Can you reproduce the issue? [Required for this SRDC]: y
Enter the full path of the SQL file which would reproduce the issue now:  [Required for this SRDC]: /software/oracle/oracle/scripts/AHF/for_oracle_support/drop_undotbs1.sql
Scripts to be run by this srdc: srdc_undo_recommendation.sql srdc_undo.sql srdc_get_errorstack_trace cp_reproduce_sql_file
Components included in this srdc: DATABASE NOCHMOS OS

EXIT; -- Exit from the custom script
Collecting data for local node(s).

Collection Id : 20210708154541qanfv-1-dbs-1b

Detailed Logging at : /opt/ahf_data/oracle.ahf/data/repository/srdc_dbaum_collection_Thu_Jul_08_15_45_42_GMT_2021_node_local/diagcollect_20210708154541_qanfv-1-dbs-1b.log
2021/07/08 15:45:47 GMT : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2021/07/08 15:45:47 GMT : Collection Name : tfa_srdc_dbaum_Thu_Jul_08_15_45_42_GMT_2021.zip
2021/07/08 15:45:47 GMT : Getting list of files satisfying time range [07/08/2021 08:45:47 GMT, 07/08/2021 15:45:47 GMT]
2021/07/08 15:45:47 GMT : Collecting additional diagnostic information...
2021/07/08 15:45:58 GMT : Collecting ADR incident files...
2021/07/08 15:46:04 GMT : Completed collection of additional diagnostic information...
2021/07/08 15:46:08 GMT : Completed Local Collection
.-------------------------------------------.
|             Collection Summary            |
+----------------+-----------+-------+------+
| Host           | Status    | Size  | Time |
+----------------+-----------+-------+------+
| qanfv-1-dbs-1b | Completed | 413kB |  21s |
'----------------+-----------+-------+------'

Logs are being collected to: /opt/ahf_data/oracle.ahf/data/repository/srdc_dbaum_collection_Thu_Jul_08_15_45_42_GMT_2021_node_local
/opt/ahf_data/oracle.ahf/data/repository/srdc_dbaum_collection_Thu_Jul_08_15_45_42_GMT_2021_node_local/qanfv-1-dbs-1b.tfa_srdc_dbaum_Thu_Jul_08_15_45_42_GMT_2021.zip


Example C - gather info about Golden Gate
As oracle user run:
cd /opt/oracle.ahf/bin
./tfactl diagcollect -srdc gg_abend

oracle@my_host:~>% cd /opt/oracle.ahf/bin
oracle@my_host:/opt/oracle.ahf/bin>% ./tfactl diagcollect -srdc gg_abend
Enter the Database Name [Required for this SRDC] : IGT
Use of uninitialized value in split at /opt/oracle.ahf/tfa/bin/common/dbutil.pm line 1140.
Database Name IGT was specificed however this database has a Database Unique Name of igt.
 Database Unique Name igt set for IGT.

Enter the GoldenGate Home [Required for this SRDC]: /software/oracle/1910
Enter the failed GoldenGate component name [Required for this SRDC]: EXT_P_01
Is this a Microservices Install type? [Y|N]  [Required for this SRDC]: N

Components included in this collection: OS DATABASE

Preparing to execute support diagnostic scripts.
Use of uninitialized value in split at /opt/oracle.ahf/tfa/bin/common/dbutil.pm line 1140.
Use of uninitialized value in split at /opt/oracle.ahf/tfa/bin/common/dbutil.pm line 1140.
Use of uninitialized value in split at /opt/oracle.ahf/tfa/bin/common/dbutil.pm line 1140.
Use of uninitialized value in split at /opt/oracle.ahf/tfa/bin/common/dbutil.pm line 1140.
 Executing DB Script ogg_12102.sql on igt with timeout of 300 seconds...
 Executing OS Script cp_files_classic with timeout of 120 seconds...
  Script Execution Failed.  Review the collection SRDC log for details...
 Executing OS Script get_gghome_opatch_info with timeout of 500 seconds...

Collecting data for the last 1 hours for this component ...

Collecting data for local node(s).

TFA is using system timezone for collection, All times shown in GMT.

Collection Id : 20220425121900phliptst-1-aps01

Detailed Logging at : /opt/ahf_data/oracle.ahf/data/repository/srdc_gg_abend_collection_Mon_Apr_25_12_19_04_GMT_2022_node_local/diagcollect_20220425121900_phliptst-1-aps01.log
2022/04/25 12:19:09 GMT : NOTE : Any file or directory name containing the string .com will be renamed to replace .com with dotcom
2022/04/25 12:19:09 GMT : Collection Name : tfa_srdc_gg_abend_Mon_Apr_25_12_19_02_GMT_2022.zip
2022/04/25 12:19:09 GMT : Getting list of files satisfying time range [04/25/2022 11:19:09 GMT, 04/25/2022 12:19:09 GMT]
2022/04/25 12:19:09 GMT : Collecting additional diagnostic information...
2022/04/25 12:19:21 GMT : Collecting ADR incident files...
2022/04/25 12:19:41 GMT : Completed collection of additional diagnostic information...
2022/04/25 12:19:41 GMT : Completed Local Collection

.--------------------------------------------.
|             Collection Summary             |
+------------------+-----------+------+------+
| Host             | Status    | Size | Time |
+------------------+-----------+------+------+
| my_server        | Completed | 1MB  |  32s |
'------------------+-----------+------+------'

Logs are being collected to: /opt/ahf_data/oracle.ahf/data/repository/srdc_gg_abend_collection_Mon_Apr_25_12_19_04_GMT_2022_node_local
/opt/ahf_data/oracle.ahf/data/repository/srdc_gg_abend_collection_Mon_Apr_25_12_19_04_GMT_2022_node_local/phliptst-1-aps01.tfa_srdc_gg_abend_Mon_Apr_25_12_19_02_GMT_2022.zip
oracle@my_server:/opt/oracle.ahf/bin>%

No comments:

Post a Comment