Logging

/var/log/syslog and /var/log/messages store all global system activity data, including startup messages. Debian-based systems like Ubuntu store this in /var/log/syslog, while Red Hat-based systems like RHEL or CentOS use /var/log/messages. /var/log/auth.log and /var/log/secure store all security-related events such as logins, root user actions, and output from pluggable authentication modules (PAM). Ubuntu and Debian use /var/log/auth.log, while Red Hat and CentOS use /var/log/secure.

/var/log/kern.log stores kernel events, errors, and warning logs, which are particularly helpful for troubleshooting custom kernels.

/var/log/cron stores information about scheduled tasks (cron jobs). Use this data to verify your cron jobs are running successfully

to look at BMC logs from the OS, use ipmitool sel elist . whenever there is anything hardware related, this is what you do. For instance, on 8/4 eic-gt-gpu6 mysteriously went offline and was in a hung state. We had to pull the power cords and get it to reboot. The BMC logs revealed that the CPU was not getting power and it's core voltage dropped to zero

c0f | 08/04/2023 | 16:13:15 | Voltage P0_VDDCR_SOC | Lower Non-critical going low  | Asserted | Reading 0 < Threshold 0.45 Volts
c10 | 08/04/2023 | 16:13:15 | Voltage P0_VDDCR_SOC | Lower Critical going low  | Asserted | Reading 0 < Threshold 0.40 Volts
c11 | 08/04/2023 | 16:13:15 | Voltage P0_VDDCR_CPU | Lower Non-critical going low  | Asserted | Reading 0 < Threshold 0.45 Volts
 c12 | 08/04/2023 | 16:13:15 | Voltage P0_VDDCR_CPU | Lower Critical going low  | Asserted | Reading 0 < Threshold 0.40 Volts
 c13 | 08/04/2023 | 16:13:15 | Voltage P0_VDD_18 | Lower Non-critical going low  | Asserted | Reading 0 < Threshold 1.63 Volts
 c14 | 08/04/2023 | 16:13:15 | Voltage P0_VDD_18 | Lower Critical going low  | Asserted | Reading 0 < Threshold 1.55 Volts
 c15 | 08/04/2023 | 16:13:15 | Voltage P1_VDDCR_CPU | Lower Non-critical going low  | Asserted | Reading 0 < Threshold 0.45 Volts
 c16 | 08/04/2023 | 16:13:15 | Voltage P1_VDDCR_CPU | Lower Critical going low  | Asserted | Reading 0 < Threshold 0.40 Volts
 c17 | 08/04/2023 | 16:13:15 | Voltage P1_VDDCR_SOC | Lower Non-critical going low  | Asserted | Reading 0 < Threshold 0.45 Volts
 c18 | 08/04/2023 | 16:13:15 | Voltage P1_VDDCR_SOC | Lower Critical going low  | Asserted | Reading 0 < Threshold 0.40 Volts
 c19 | 08/04/2023 | 16:13:15 | Voltage P1_VDD_18 | Lower Non-critical going low  | Asserted | Reading 0.01 < Threshold 1.63 Volts
 c1a | 08/04/2023 | 16:13:15 | Voltage P1_VDD_18 | Lower Critical going low  | Asserted | Reading 0.01 < Threshold 1.55 Volts