LinuxTab - A man page a day keeps the trouble away: September 2018

Solaris 10 - cron job didn't run at scheduled time.

Checking log shows queue max run limit reached

# cat /var/cron/log
......
The problem: queue max limit reached
Solution: Restart the con service

1. Find the cron process
# ps -ef | grep cron
# ptree <PID>
# kill -9 <PID>

2 Restart cron service
# svcadm restart svc:/system/cron

3. raise the limit
# vi /etc/cron.d/queuedefs

if you still have issue, review queuedefs man page.

# more /etc/cron.d/queuedefs
a.4j1n
b.2j2n90w

# man queuedefs

       This file specifies that the a queue, for at jobs, can have up to 4
       jobs running simultaneously; those jobs will be run with a nice value
       of 1. As no nwait value was given, if a job cannot be run because too
       many other jobs are running cron will wait 60 seconds before trying
       again to run it.

       The b queue, for batch(1) jobs, can have up to 2 jobs running simulta-
       neously; those jobs will be run with a nice(1) value of 2. If a job
       cannot be run because too many other jobs are running, cron(1M) will
       wait 90 seconds before trying again to run it. All other queues can
       have up to 100 jobs running simultaneously; they will be run with a
       nice value of 2, and if a job cannot be run because too many other jobs
       are running cron will wait 60 seconds before trying again to run it.

zfs - Resolving a Removed LUN from the solaris 11 system.

SAN team accidently removed active used LUN and the pool went to suspended mode. We have to present the same LUN to the same host in order to fix the issue.

The guy presented the same LUN back to the host.

root@srdf-mg-p1:~# echo | format | grep -i 0419
      34. c3t6000D310006D64000000000000000419d0 <COMPELNT-Compellent Vol-0607-350.00GB>
          /scsi_vhci/ssd@g6000d310006d64000000000000000419

root@srdf-mg-p1:~# zpool export srdfmgp05-datacopy
cannot export 'srdfmgp05-datacopy': pool I/O is currently suspended
root@srdf-mg-p1:~# zpool status srdfmgp05-datacopy
pool: srdfmgp05-datacopy
state: SUSPENDED
status: One or more devices are unavailable in response to IO failures.
        The pool is suspended.
action: Make sure the affected devices are connected, then run 'zpool clear' or
        'fmadm repaired'.
        Run 'zpool status -v' to see device specific details.
   see: http://support.oracle.com/msg/ZFS-8000-HC
scan: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        srdfmgp05-datacopy                       SUSPENDED     2     0     0
          c3t6000D310006D64000000000000000419d0 ONLINE       0     0     0

root@srdf-mg-p1:~# fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- ---------
Aug 28 11:54:35 8728f18f-ec4f-4e65-bd0e-cd8dab33617d ZFS-8000-8A    Critical

Problem Status    : open
Diag Engine       : zfs-diagnosis / 1.0
System
    Manufacturer : Oracle Corporation
    Name          : SPARC T7-1
    Part_Number   : 34863714+1+1
    Serial_Number : AK00397012
    Host_ID       : 86c0ac2a
    Server_Name           : srdf-mg-p1

----------------------------------------
Suspect 1 of 1 :
   Problem class : fault.fs.zfs.object.corrupt_data
   Certainty   : 100%
   Affects     : zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy
   Status      : faulted and taken out of service

   FRU
     Status           : faulty
     FMRI             : "zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy"

Description : A file or directory in pool 'srdfmgp05-datacopy' could not be
              read due to corrupt data.

Response    : No automated response will occur.

Impact      : The file or directory is unavailable.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
              Run 'zpool status -xv' and examine the list of damaged files to
              determine what has been affected. Please refer to the associated
              reference document at http://support.oracle.com/msg/ZFS-8000-8A
              for the latest service procedures and policies regarding this
              diagnosis.

--------------- ------------------------------------ -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- ---------
Aug 28 11:54:35 3cd0778a-78dc-499d-a5f1-98c3dac9c7a4 ZFS-8000-HC    Major

Problem Status    : open
Diag Engine       : zfs-diagnosis / 1.0
System
    Manufacturer : Oracle Corporation
    Name          : SPARC T7-1
    Part_Number   : 34863714+1+1
    Serial_Number : AK00397012
    Host_ID       : 86c0ac2a
    Server_Name           : srdf-mg-p1

----------------------------------------
Suspect 1 of 1 :
   Problem class : fault.fs.zfs.io_failure_wait
   Certainty   : 100%
   Affects     : zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy
   Status      : faulted and taken out of service

   FRU
     Status           : faulty
     FMRI             : "zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy"

Description : ZFS pool 'srdfmgp05-datacopy' has experienced currently
              unrecoverable I/O failures.

Response    : No automated response will occur.

Impact      : Read and write I/Os cannot be serviced.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
              Make sure the affected devices are connected, then run 'zpool
              clear'. Please refer to the associated reference document at
              http://support.oracle.com/msg/ZFS-8000-HC for the latest service
              procedures and policies regarding this diagnosis.

root@srdf-mg-p1:~# zpool list
NAME                 SIZE ALLOC   FREE CAP DEDUP     HEALTH ALTROOT
rpool                556G   159G   397G 28% 1.00x     ONLINE -
srdfmgp01-csf       99.5G 23.9G 75.6G 24% 1.00x     ONLINE -
srdfmgp05-data       199G   165G 34.4G 82% 1.00x     ONLINE -
srdfmgp05-datacopy   348G 75.1G   273G 21% 1.00x SUSPENDED -
root@srdf-mg-p1:~# zpool clear srdfmgp05-datacopy
root@srdf-mg-p1:~# zpool list
NAME                 SIZE ALLOC   FREE CAP DEDUP HEALTH ALTROOT
rpool                556G   159G   397G 28% 1.00x ONLINE -
srdfmgp01-csf       99.5G 23.9G 75.6G 24% 1.00x ONLINE -
srdfmgp05-data       199G   165G 34.4G 82% 1.00x ONLINE -
srdfmgp05-datacopy   348G 75.1G   273G 21% 1.00x ONLINE -
root@srdf-mg-p1:~#
root@srdf-mg-p1:/datacopy# df -h /datacopy
Filesystem             Size   Used Available Capacity Mounted on
srdf-mg-p1-datacopy/FS_DATACOPY           343G    75G       267G    22%    /datacopy

root@srdf-mg-p1:~# cd /datacopy/
root@srdf-mg-p1:/datacopy# ls
12.1.0_clnp.tar PROD

LinuxTab - A man page a day keeps the trouble away

Monday, September 10, 2018

Solaris10/11 - Initially takes long time to login to the server

Thursday, September 6, 2018

Solaris 10 - cron job didn't run at scheduled time.

Solaris 11 - zfs - Resolving a Removed LUN from the solaris 11 server