RedHat:Kernel panic due to divide error in intel_pstate_timer_func() function.

RedHat:Kernel panic due to divide error in intel_pstate_timer_func() function.

故障现象:

Redhat 自动重启,重启前有vmcore文件生成,分析vmcore文件,有以下报错:

[64146063.970892] divide error: 0000 [#1] SMP
[64146063.971188] Modules linked in: sr_mod cdrom usb_storage ipmi_watchdog ipmi_poweroff binfmt_misc coretemp iTCO_wdt iTCO_vendor_support intel_rapl kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd i2c_i801 wmi pcspkr ipmi_devintf mei_me sb_edac mei lpc_ich ioatdma edac_core shpchp mfd_core ipmi_si ipmi_msghandler acpi_power_meter dm_mirror dm_region_hash dm_log dm_mod xfs libcrc32c sd_mod crc_t10dif crct10dif_common ast syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm ixgbe mdio i2c_core ptp megaraid_sas pps_core dca
[64146063.972454] CPU: 12 PID: 0 Comm: swapper/12 Not tainted 3.10.0-229.14.1.el7.x86_64 #1
[64146063.972766] Hardware name: Inspur SA5112M4/YZMB-00370-107, BIOS 4.0.6 07/13/2015
[64146063.973094] task: ffff881fd313e660 ti: ffff881fd3174000 task.ti: ffff881fd3174000
[64146063.973437] RIP: 0010:[<ffffffff814a9809>] [<ffffffff814a9809>] intel_pstate_timer_func+0x179/0x3d0
[64146063.973813] RSP: 0018:ffff88407fc83db8 EFLAGS: 00010206
[64146063.974190] RAX: 0000000027100000 RBX: ffff881fd1ff0c00 RCX: 0000000000000000
[64146063.974588] RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e5363df
[64146063.974993] RBP: ffff88407fc83e20 R08: 00e3c180b8292dc0 R09: ffff883fd2864001
[64146063.975411] R10: 0000000000000002 R11: 0000000000000005 R12: 0000000000005207
[64146063.975844] R13: 0000000000271000 R14: 0000000000005207 R15: 0000000000000246
[64146063.976279] FS: 0000000000000000(0000) GS:ffff88407fc80000(0000) knlGS:0000000000000000
[64146063.976732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[64146063.977197] CR2: 00007f7172dd27b0 CR3: 0000001fd17b9000 CR4: 00000000001407e0
[64146063.977676] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[64146063.978165] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[64146063.978656] Stack:
[64146063.979148] ffff88407fc83e58 ffffffff81099de1 0000000000000086 000000000000000c
[64146063.979678] ffff88407fc93680 000000000000000c 8eb024ee172d723d ffffffff810b6a8d
[64146063.980227] ffff883fd2864000 ffff881fd1ff0c08 0000000000000100 ffffffff814a9690
[64146063.980782] Call Trace:
[64146063.981335] <IRQ>
[64146063.981344]
[64146063.981905] [<ffffffff81099de1>] ? run_posix_cpu_timers+0x51/0x840
[64146063.982484] [<ffffffff810b6a8d>] ? trigger_load_balance+0x5d/0x200
[64146063.983073] [<ffffffff814a9690>] ? pid_param_set+0x130/0x130
[64146063.983674] [<ffffffff8107df66>] call_timer_fn+0x36/0x110
[64146063.984279] [<ffffffff814a9690>] ? pid_param_set+0x130/0x130
[64146063.984894] [<ffffffff8107fddf>] run_timer_softirq+0x21f/0x320
[64146063.985511] [<ffffffff81077b3f>] __do_softirq+0xef/0x280
[64146063.986134] [<ffffffff81615c9c>] call_softirq+0x1c/0x30
[64146063.986763] [<ffffffff81015d95>] do_softirq+0x65/0xa0
[64146063.987395] [<ffffffff81077ed5>] irq_exit+0x115/0x120
[64146063.988033] [<ffffffff81616915>] smp_apic_timer_interrupt+0x45/0x60
[64146063.988691] [<ffffffff81614fdd>] apic_timer_interrupt+0x6d/0x80
[64146063.989348] <EOI>
[64146063.989356]
[64146063.990013] [<ffffffff8109b838>] ? hrtimer_start+0x18/0x20
[64146063.990690] [<ffffffff81052de6>] ? native_safe_halt+0x6/0x10
[64146063.991378] [<ffffffff8101c85f>] default_idle+0x1f/0xc0
[64146063.992072] [<ffffffff8101d166>] arch_cpu_idle+0x26/0x30
[64146063.992778] [<ffffffff810c6921>] cpu_startup_entry+0xf1/0x290
[64146063.993492] [<ffffffff8104228a>] start_secondary+0x1ba/0x230
[64146063.994205] Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 <48> f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
[64146063.995759] RIP [<ffffffff814a9809>] intel_pstate_timer_func+0x179/0x3d0
[64146063.996497] RSP <ffff88407fc83db8>

故障范围:

  • Red Hat Enterprise Linux – 7.1, observed with the following kernel revisions, though others could be impacted as well:
    • kernel-3.10.0-229.1.2.el7.x86_64
  • Red Hat Enterprise Linux 6, minor version <8

解决方案:

Redhat官网说明:https://access.redhat.com/solutions/1471663

Red Hat Enterprise Linux 7

  • Red Hat Enterprise Linux 7.1.z: Upgrade to kernel-3.10.0-229.20.1.el7 from Errata: RHSA-2015-1978 or later.
  • Red Hat Enterprise Linux 7.2: Upgrade to kernel-3.10.0-327.el7 from Errata: RHSA-2015-2152 or later.
  • 临时解决方案: for RHEL7: Boot the system with kernel version “3.10.0-123.20.1.el7” or older.

Red Hat Enterprise Linux 6

  • Red Hat Enterprise Linux 6: Upgrade to kernel-2.6.32-642.el6 from Errata: RHSA-2016-0855) or later. This fix is already included in RHEL6.8GA and later.