Hi,
we have seen the issue of intermittent disk IO write stuck a few times per day, and would like to get some help here.
Issue: the metric shows there is zero disk write IO for a period of time.
Other important symptoms:
1. Before zero write IO, there is always a sudden increase in system load and decrease in CPU utilization.
2. There are gaps in kernel logs, atop logs, etc during the issue happens.
3. Right before the issue happens, there is always a untar operations on a tar file with size of 1.2GB.
4. Memsome and memfull become non-zero from PSI section of atop output around the issue happens, after the issue is gone, we see memsome and memfull becomes zero and disk becomes busy.
5. iosome and iofull is always zero during the issue.
6. We keeps seeing the below kernel error messages around the issue makes me think it is related to jbd2:Thank for any pointers and advice in advance!
we have seen the issue of intermittent disk IO write stuck a few times per day, and would like to get some help here.
Issue: the metric shows there is zero disk write IO for a period of time.
Other important symptoms:
1. Before zero write IO, there is always a sudden increase in system load and decrease in CPU utilization.
2. There are gaps in kernel logs, atop logs, etc during the issue happens.
3. Right before the issue happens, there is always a untar operations on a tar file with size of 1.2GB.
4. Memsome and memfull become non-zero from PSI section of atop output around the issue happens, after the issue is gone, we see memsome and memfull becomes zero and disk becomes busy.
5. iosome and iofull is always zero during the issue.
6. We keeps seeing the below kernel error messages around the issue makes me think it is related to jbd2:
Code:
Nov 14 15:55:53 phx51-a84 kernel: [2090478.140196] INFO: task jbd2/sdb1-8:2510 blocked for more than 120 seconds.Nov 14 15:55:53 phx51-a84 kernel: [2090478.145095] Tainted: G E 6.1.53-1 #1Nov 14 15:55:53 phx51-a84 kernel: [2090478.146486] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Nov 14 15:55:53 phx51-a84 kernel: [2090478.148155] task:jbd2/sdb1-8 state:D stack:0 pid:2510 ppid:2 flags:0x00004000Nov 14 15:55:53 phx51-a84 kernel: [2090478.149944] Call Trace:Nov 14 15:55:53 phx51-a84 kernel: [2090478.150878] <TASK>Nov 14 15:55:53 phx51-a84 kernel: [2090478.151737] __schedule+0x2ec/0x950Nov 14 15:55:53 phx51-a84 kernel: [2090478.152896] schedule+0x53/0xc0Nov 14 15:55:53 phx51-a84 kernel: [2090478.153841] jbd2_journal_wait_updates+0x6f/0xd0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.155087] ? sched_energy_aware_handler+0xb0/0xb0Nov 14 15:55:53 phx51-a84 kernel: [2090478.156414] jbd2_journal_commit_transaction+0x217/0x19d0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.157769] ? kvm_sched_clock_read+0xd/0x20Nov 14 15:55:53 phx51-a84 kernel: [2090478.158868] ? psi_group_change+0x11a/0x2e0Nov 14 15:55:53 phx51-a84 kernel: [2090478.159914] ? finish_task_switch.isra.0+0x83/0x270Nov 14 15:55:53 phx51-a84 kernel: [2090478.161252] ? lock_timer_base+0x61/0x80Nov 14 15:55:53 phx51-a84 kernel: [2090478.162312] kjournald2+0xa5/0x240 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.163353] ? sched_energy_aware_handler+0xb0/0xb0Nov 14 15:55:53 phx51-a84 kernel: [2090478.164778] ? jbd2_fc_wait_bufs+0x90/0x90 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.165894] kthread+0xb9/0xe0Nov 14 15:55:53 phx51-a84 kernel: [2090478.166747] ? kthread_complete_and_exit+0x20/0x20Nov 14 15:55:53 phx51-a84 kernel: [2090478.167829] ret_from_fork+0x1f/0x30Nov 14 15:55:53 phx51-a84 kernel: [2090478.168890] </TASK>Nov 14 15:55:53 phx51-a84 kernel: [2090478.169601] INFO: task rs:main Q:Reg:2532 blocked for more than 120 seconds.Nov 14 15:55:53 phx51-a84 kernel: [2090478.170958] Tainted: G E 6.1.53-1 #1Nov 14 15:55:53 phx51-a84 kernel: [2090478.172163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Nov 14 15:55:53 phx51-a84 kernel: [2090478.173621] task:rs:main Q:Reg state:D stack:0 pid:2532 ppid:1 flags:0x00000000Nov 14 15:55:53 phx51-a84 kernel: [2090478.175057] Call Trace:Nov 14 15:55:53 phx51-a84 kernel: [2090478.175763] <TASK>Nov 14 15:55:53 phx51-a84 kernel: [2090478.176537] __schedule+0x2ec/0x950Nov 14 15:55:53 phx51-a84 kernel: [2090478.177393] schedule+0x53/0xc0Nov 14 15:55:53 phx51-a84 kernel: [2090478.178276] wait_transaction_locked+0x8a/0xd0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.179287] ? sched_energy_aware_handler+0xb0/0xb0Nov 14 15:55:53 phx51-a84 kernel: [2090478.180402] add_transaction_credits+0xd9/0x2a0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.181448] start_this_handle+0x100/0x570 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.182387] ? kmem_cache_alloc+0x13f/0x280Nov 14 15:55:53 phx51-a84 kernel: [2090478.183317] jbd2__journal_start+0xf7/0x1e0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.184444] __ext4_journal_start_sb+0xf0/0x100 [ext4]Nov 14 15:55:53 phx51-a84 kernel: [2090478.185493] ext4_dirty_inode+0x35/0x80 [ext4]Nov 14 15:55:53 phx51-a84 kernel: [2090478.186445] __mark_inode_dirty+0x53/0x310Nov 14 15:55:53 phx51-a84 kernel: [2090478.187385] generic_update_time+0x77/0xc0Nov 14 15:55:53 phx51-a84 kernel: [2090478.188367] file_modified_flags+0xd7/0xf0Nov 14 15:55:53 phx51-a84 kernel: [2090478.189346] ext4_buffered_write_iter+0x54/0x130 [ext4]Nov 14 15:55:53 phx51-a84 kernel: [2090478.190392] vfs_write+0x272/0x380Nov 14 15:55:53 phx51-a84 kernel: [2090478.191213] ksys_write+0x5f/0xe0Nov 14 15:55:53 phx51-a84 kernel: [2090478.191933] do_syscall_64+0x56/0x80Nov 14 15:55:53 phx51-a84 kernel: [2090478.192781] ? do_syscall_64+0x63/0x80Nov 14 15:55:53 phx51-a84 kernel: [2090478.193500] entry_SYSCALL_64_after_hwframe+0x46/0xb0Nov 14 15:55:53 phx51-a84 kernel: [2090478.194400] RIP: 0033:0x7fa2b9963fefNov 14 15:55:53 phx51-a84 kernel: [2090478.195111] RSP: 002b:00007fa2b8b20860 EFLAGS: 00000293 ORIG_RAX: 0000000000000001Nov 14 15:55:53 phx51-a84 kernel: [2090478.196473] RAX: ffffffffffffffda RBX: 00007fa2b0011ab0 RCX: 00007fa2b9963fefNov 14 15:55:53 phx51-a84 kernel: [2090478.197646] RDX: 0000000000000247 RSI: 00007fa2b0008d40 RDI: 0000000000000007Nov 14 15:55:53 phx51-a84 kernel: [2090478.198823] RBP: 0000000000000247 R08: 0000000000000000 R09: 0000000000000000Nov 14 15:55:53 phx51-a84 kernel: [2090478.199987] R10: 0000000000000000 R11: 0000000000000293 R12: 00007fa2b0008d40Nov 14 15:55:53 phx51-a84 kernel: [2090478.201235] R13: 0000000000000000 R14: 0000000000000247 R15: 00007fa2b0011ab0Nov 14 15:55:53 phx51-a84 kernel: [2090478.202323] </TASK>Nov 14 15:55:53 phx51-a84 kernel: [2090478.238330] INFO: task containerd:7533 blocked for more than 120 seconds.Nov 14 15:55:53 phx51-a84 kernel: [2090478.239419] Tainted: G E 6.1.53-1 #1Nov 14 15:55:53 phx51-a84 kernel: [2090478.240450] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.Nov 14 15:55:53 phx51-a84 kernel: [2090478.241648] task:containerd state:D stack:0 pid:7533 ppid:1 flags:0x00000000Nov 14 15:55:53 phx51-a84 kernel: [2090478.243027] Call Trace:Nov 14 15:55:53 phx51-a84 kernel: [2090478.243572] <TASK>Nov 14 15:55:53 phx51-a84 kernel: [2090478.244179] __schedule+0x2ec/0x950Nov 14 15:55:53 phx51-a84 kernel: [2090478.244946] schedule+0x53/0xc0Nov 14 15:55:53 phx51-a84 kernel: [2090478.245563] wait_transaction_locked+0x8a/0xd0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.246498] ? sched_energy_aware_handler+0xb0/0xb0Nov 14 15:55:53 phx51-a84 kernel: [2090478.247374] add_transaction_credits+0xd9/0x2a0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.248393] start_this_handle+0x100/0x570 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.249249] ? kmem_cache_alloc+0x13f/0x280Nov 14 15:55:53 phx51-a84 kernel: [2090478.250005] jbd2__journal_start+0xf7/0x1e0 [jbd2]Nov 14 15:55:53 phx51-a84 kernel: [2090478.250839] __ext4_journal_start_sb+0xf0/0x100 [ext4]Nov 14 15:55:53 phx51-a84 kernel: [2090478.251712] ext4_dirty_inode+0x35/0x80 [ext4]Nov 14 15:55:53 phx51-a84 kernel: [2090478.252609] __mark_inode_dirty+0x53/0x310Nov 14 15:55:53 phx51-a84 kernel: [2090478.253368] generic_update_time+0x77/0xc0Nov 14 15:55:53 phx51-a84 kernel: [2090478.254163] file_modified_flags+0xd7/0xf0Nov 14 15:55:53 phx51-a84 kernel: [2090478.254965] ext4_buffered_write_iter+0x54/0x130 [ext4]Nov 14 15:55:53 phx51-a84 kernel: [2090478.255917] vfs_write+0x272/0x380Nov 14 15:55:53 phx51-a84 kernel: [2090478.256716] ksys_write+0x5f/0xe0Nov 14 15:55:53 phx51-a84 kernel: [2090478.257358] do_syscall_64+0x56/0x80Nov 14 15:55:53 phx51-a84 kernel: [2090478.258068] ? do_syscall_64+0x63/0x80Nov 14 15:55:53 phx51-a84 kernel: [2090478.258758] ? syscall_exit_to_user_mode+0x30/0x50Nov 14 15:55:53 phx51-a84 kernel: [2090478.259569] ? do_syscall_64+0x63/0x80Nov 14 15:55:53 phx51-a84 kernel: [2090478.260371] entry_SYSCALL_64_after_hwframe+0x46/0xb0Nov 14 15:55:53 phx51-a84 kernel: [2090478.261283] RIP: 0033:0x557ba3db828eNov 14 15:55:53 phx51-a84 kernel: [2090478.262045] RSP: 002b:000000c00584e7e0 EFLAGS: 00000206 ORIG_RAX: 0000000000000001Nov 14 15:55:53 phx51-a84 kernel: [2090478.263262] RAX: ffffffffffffffda RBX: 000000000000001e RCX: 0000557ba3db828eNov 14 15:55:53 phx51-a84 kernel: [2090478.264622] RDX: 0000000000000905 RSI: 000000c001914000 RDI: 000000000000001eNov 14 15:55:53 phx51-a84 kernel: [2090478.265860] RBP: 000000c00584e820 R08: 0000000000000000 R09: 0000000000000000Nov 14 15:55:53 phx51-a84 kernel: [2090478.267036] R10: 0000000000000000 R11: 0000000000000206 R12: 000000c00584e960Nov 14 15:55:53 phx51-a84 kernel: [2090478.268276] R13: 0000000000000000 R14: 000000c00209b040 R15: 0000000000000040Nov 14 15:55:53 phx51-a84 kernel: [2090478.269442] </TASK>
Statistics: Posted by kamida — 2024-11-26 20:40 — Replies 0 — Views 39