Project

General

Profile

Bug #83147

Use cam to get temperature for SCSI disks

Added by Caleb St. John over 2 years ago. Updated over 2 years ago.

Status:
Done
Priority:
No priority
Assignee:
William Grzybowski
Category:
Middleware
Target version:
Severity:
High
Reason for Closing:
Reason for Blocked:
Needs QA:
No
Needs Doc:
No
Needs Merging:
No
Needs Automation:
No
Support Suite Ticket:
n/a
Hardware Configuration:
ChangeLog Required:
No

Description

On an M50 with 150 disks, running version TrueNAS-11.2-MASTER-201903200659 the CPU goes down to 65% idle when the box is sitting there with no I/O coming into the box.

Upon investigation, collectd is destroying the system by running

smartctl -a -n standby /dev/da**
. My guess is that it's running this command on all disks, in parallel which is causing unnecessary I/O to the disks and also unnecessary CPU time. Below is the output of commands taken when the box goes into this state.

last pid: 57756;  load averages:  6.37,  5.82,  5.62                                                                                                                                                                                                       up 0+18:28:51  08:40:54
3673 processes:47 running, 3347 sleeping, 279 waiting
CPU:  0.4% user,  0.0% nice, 28.9% system,  0.0% interrupt, 70.7% idle
Mem: 687M Active, 1069M Inact, 8259M Wired, 239G Free
ARC: 2664M Total, 528M MFU, 1995M MRU, 726K Anon, 25M Header, 115M Other
     781M Compressed, 1819M Uncompressed, 2.33:1 Ratio
Swap: 16G Total, 16G Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
13723 root        35    0   124M 63956K umtxn   0  45:24  86.58% collectd{writer#3}
13723 root        86    0   124M 63956K CPU6    6   0:07  86.47% collectd{collectd}
96006 root        52    0   301M   226M uwait  34   8:57  84.27% python3.6{python3.6}
13723 root        52    0   124M 63956K uwait  22   0:07  76.73% collectd{collectd}
13723 root        52    0   124M 63956K uwait  29   0:08  75.67% collectd{collectd}
13723 root        52    0   124M 63956K uwait  38   0:07  58.29% collectd{collectd}
13723 root        52    0   124M 63956K uwait  19   0:07  57.27% collectd{collectd}
13723 root        52    0   124M 63956K uwait  36   0:07  54.12% collectd{collectd}
13723 root        52    0   124M 63956K uwait  33   0:08  53.54% collectd{collectd}
13723 root        52    0   124M 63956K RUN    21   0:07  53.53% collectd{collectd}
13723 root        52    0   124M 63956K uwait  33   0:07  53.33% collectd{collectd}
13723 root        52    0   124M 63956K uwait  23   0:07  52.82% collectd{collectd}
96006 root        52    0   301M   226M usem   23   0:00  47.07% python3.6{python3.6}
13723 root        40    0   124M 63956K CPU30  30  45:10  47.02% collectd{writer#1}
96006 root        52    0   301M   226M uwait   1   0:00  46.97% python3.6{python3.6}
96006 root        85    0   301M   226M CPU24  24   0:00  46.58% python3.6{python3.6}
13723 root        31    0   124M 63956K umtxn   0  44:45  46.20% collectd{writer#2}
13723 root        35    0   124M 63956K umtxn  21  45:16  29.50% collectd{writer#4}
13723 root        41    0   124M 63956K usem   34  10:59  24.89% collectd{reader#2}
13723 root        52    0   124M 63956K nanslp  8  24:21  21.59% collectd{collectd}
96006 root        52    0   301M   226M usem   17   0:00  20.37% python3.6{python3.6}
13723 root        26    0   124M 63956K vm map 23  45:55  16.86% collectd{writer#0}
   25 root       -16    -     0K    48K psleep  8  11:01  14.64% pagedaemon{dom0}
 5390 root        20    0   426M   362M umtxn   6   4:04   3.40% uwsgi-3.6{uwsgi-3.6}
13723 root        25    0   124M 63956K uwait  39   9:51   3.34% collectd{reader#3}
96006 root        52    0   301M   226M usem   22   0:00   1.86% python3.6{python3.6}
57750 root        20    0 18180K 11564K CPU9    9   0:00   1.25% top
11525 nobody      20    0  6928K  3476K select 15   1:01   0.22% mdnsd
 5063 root        20    0 40128K 18464K select 25   0:16   0.21% rrdcached{rrdcached}
   31 root        16    -     0K    16K syncer 30   0:05   0.21% syncer
 3956 nobody      20    0  6928K  3496K select 37   1:51   0.12% mdnsd
   12 root       -88    -     0K  4464K WAIT   37   0:19   0.08% intr{irq493: mpr1}
   17 root       -16    -     0K    16K -       6   0:31   0.03% rand_harvestq
   12 root       -60    -     0K  4464K WAIT    0   7:26   0.03% intr{swi4: clock (0)}
83659 www         20    0 14344K  8588K kqread 34   0:10   0.03% nginx
    4 root       -16    -     0K   128K -       4   0:42   0.03% cam{doneq2}
root@tn11b:~ # procstat -akk | grep collectd
13723 101245 collectd            -                   _mtx_lock_spin_cookie+0xc1 smp_targeted_tlb_shootdown+0x3ae smp_masked_invltlb+0x3d pmap_invalidate_all+0x232 pmap_protect+0x6e6 vmspace_fork+0xa55 fork1+0x47a sys_fork+0x39 amd64_syscall+0xa38 fast_syscall_common+0x101 
13723 101519 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102287 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102512 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102698 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102702 collectd            writer#0            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102704 collectd            writer#1            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102705 collectd            writer#2            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102706 collectd            writer#3            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102717 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102752 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102817 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102862 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102863 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102872 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102903 collectd            writer#4            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 102985 collectd            reader#0            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101 
13723 103478 collectd            reader#1            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101 
13723 103545 collectd            reader#2            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
13723 103723 collectd            reader#3            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101 
13723 105056 collectd            reader#4            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101
root       13723  408.1  0.0 127260  63956  -  Ss   18:25    1029:43.19 |-- /usr/local/sbin/collectd
root       71741   44.4  0.0   8664   4884  -  RL   08:45       0:01.11 | |-- /usr/local/sbin/smartctl -a -n standby /dev/da33
flame.svg (420 KB) flame.svg Caleb St. John, 03/21/2019 07:04 AM
old_version_sata_dom.svg (420 KB) old_version_sata_dom.svg Caleb St. John, 03/26/2019 06:45 AM
new_version_sata_dom.svg (176 KB) new_version_sata_dom.svg Caleb St. John, 03/26/2019 06:45 AM
sata_ssd.svg (333 KB) sata_ssd.svg Caleb St. John, 03/26/2019 06:45 AM

Related issues

Related to FreeNAS - Bug #83371: Add py-camDone
Related to FreeNAS - Bug #83676: Rewrite fenced to scale on system with many disks and avoid many subprocess callsClosed
Copied to FreeNAS - Bug #83875: TrueNAS-11.2-MASTER-201903200659 is destroying the CPU at idle timesClosed

Associated revisions

Revision 3cbdbe9f (diff)
Added by William Grzybowski over 2 years ago

feat(collectd): try to use `cam` to get temperature for SCSI disks Ticket: #83147

Revision 693f8cda (diff)
Added by William Grzybowski over 2 years ago

feat(collectd): try to use `cam` to get temperature for SCSI disks Ticket: #83147

Revision 9041f0cc (diff)
Added by William Grzybowski over 2 years ago

feat(collectd): try to use `cam` to get temperature for SCSI disks Ticket: #83147

Revision 1a0d35b7 (diff)
Added by William Grzybowski over 2 years ago

feat(collectd): try to use `cam` to get temperature for SCSI disks Ticket: #83147

Revision 113ed8a2 (diff)
Added by Alexander Motin over 2 years ago

Do not map small IOCTL buffers to KVA, but copy. CAM IOCTL interfaces traditionally mapped user-space data buffers to KVA. It was nice originally, but now it takes too much to handle respective TLB shootdowns, while small kernel memory allocations up to 64KB backed by UMA and accompanied by copyin()/copyout() can be much cheaper. For large buffers mapping still may have sense, and unmapped I/O would be even better, but the last unfortunately is more tricky, since unmapped I/O API is too specific to struct bio now. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Ticket: #83147 (cherry picked from commit 4ef45ba40d370005ccef60735b3c1b9fa3a67a9d)

Revision 88088810 (diff)
Added by Alexander Motin over 2 years ago

Remove remnants of XPT_MMC_IO, slipped in by mistake. Ticket: #83147

Revision e4f2e526 (diff)
Added by Alexander Motin over 2 years ago

Do not map small IOCTL buffers to KVA, but copy. CAM IOCTL interfaces traditionally mapped user-space data buffers to KVA. It was nice originally, but now it takes too much to handle respective TLB shootdowns, while small kernel memory allocations up to 64KB backed by UMA and accompanied by copyin()/copyout() can be much cheaper. For large buffers mapping still may have sense, and unmapped I/O would be even better, but the last unfortunately is more tricky, since unmapped I/O API is too specific to struct bio now. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Ticket: #83147 (cherry picked from commit 4ef45ba40d370005ccef60735b3c1b9fa3a67a9d) (cherry picked from commit 113ed8a22cbae3a7191ac6ca1e6c2fa477d52c7d)

Revision 306dac97 (diff)
Added by Alexander Motin over 2 years ago

Remove remnants of XPT_MMC_IO, slipped in by mistake. Ticket: #83147 (cherry picked from commit 88088810be1012614791e5603ab1697fd0b3ad82)

History

#1 Updated by Caleb St. John over 2 years ago

This doesn't do us any favors either.

root        7134    0.0  0.0   6488   2512  -  Is   15:24       0:01.15 |-- /usr/sbin/cron -s
root       47387    0.0  0.0   6488   2524  -  I    10:00       0:00.00 | `-- cron: running job (cron)
root       47391    0.0  0.0   7028   3264  -  Is   10:00       0:00.02 |   `-- /bin/sh /usr/local/sbin/save_rrds.sh
root       47550   99.9  0.0  22692  14540  -  R    10:00       0:43.19 |     `-- tar jcf rrd_dir.tar.bz2.47391 collectd (bsdtar)
last pid: 48657;  load averages:  1.54,  1.60,  2.17                                                                                                                                                                                                              up 0+18:42:50  10:01:08
3252 processes:46 running, 2926 sleeping, 1 zombie, 279 waiting
CPU:  2.9% user,  0.0% nice,  2.5% system,  0.0% interrupt, 94.7% idle
Mem: 885M Active, 1175M Inact, 5720M Wired, 241G Free
ARC: 2500M Total, 412M MFU, 1947M MRU, 2075K Anon, 24M Header, 114M Other
     749M Compressed, 1674M Uncompressed, 2.23:1 Ratio
Swap: 16G Total, 16G Free

  PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
47550 root       103    0 22692K 14540K CPU26  26   1:07 100.00% bsdtar

#2 Updated by Caleb St. John over 2 years ago

I'm attaching a flamegraph file of the system for when collectd is hammering the CPU. This looks like we have an issue with the vm subsystem.

#3 Updated by Dru Lavigne over 2 years ago

  • Assignee changed from Release Council to Alexander Motin
  • Private changed from No to Yes

#4 Updated by Alexander Motin over 2 years ago

  • Assignee changed from Alexander Motin to William Grzybowski

After discussing it with Caleb, it seems like a result of collectd starting bunch of smartctl processes to read disk temperatures, added by middleware team. Pushing it to them. Since the problem is most bright on large TrueNAS installations, where we use only SCSI disks, we may think about using some less expensive reading method first, such as `sg_logs -t daX`. It would be ideal to avoid forks completely, if we could run SCSI commands directly, since for SCSI disks reading temperature is practically one or may be two LOG SENSE SCSI commands to read "Temperature" and/or "Informational exceptions" log pages.

#5 Updated by Alexander Motin over 2 years ago

  • Category changed from OS to Middleware

#6 Updated by William Grzybowski over 2 years ago

  • Status changed from Unscreened to Screened
  • Target version changed from Backlog to 11.2-U4

#7 Updated by William Grzybowski over 2 years ago

#10 Updated by Bug Clerk over 2 years ago

  • Status changed from Screened to In Progress

#12 Updated by Bug Clerk over 2 years ago

  • Target version changed from 11.2-U4 to 11.3-BETA1

#13 Updated by Bug Clerk over 2 years ago

  • Copied to Bug #83875: TrueNAS-11.2-MASTER-201903200659 is destroying the CPU at idle times added

#14 Updated by William Grzybowski over 2 years ago

  • Target version changed from 11.3-BETA1 to TrueNAS 11.2-U4

#15 Updated by Alexander Motin over 2 years ago

#16 Updated by William Grzybowski over 2 years ago

  • Related to Bug #83676: Rewrite fenced to scale on system with many disks and avoid many subprocess calls added

#17 Updated by William Grzybowski over 2 years ago

  • Status changed from In Progress to Ready for Testing

#18 Updated by Ryan McKenzie over 2 years ago

  • Status changed from Ready for Testing to Passed Testing

Been testing the March 30 nightly build of TrueNAS 11.2. No signs of the CPU utilization spikes or long waiting collectd processes. No sign of any smartctl processes. System was upgraded 48 hours ago and has had a good mix of reboots, idle time, and heavy IO testing.

Took a 5 minute CPU profile during idle time and had Alexander verify that the issue seen in this ticket is resolved.

#19 Updated by Dru Lavigne over 2 years ago

  • Subject changed from TrueNAS-11.2-MASTER-201903200659 is destroying the CPU at idle times to Use cam to get temperature for SCSI disks
  • Status changed from Passed Testing to Done
  • Target version changed from TrueNAS 11.2-U4 to 11.2-U4
  • Private changed from Yes to No
  • Needs QA changed from Yes to No
  • Needs Merging changed from Yes to No

Also available in: Atom PDF