Monday, September 10, 2018

Solaris10/11 - Initially takes long time to login to the server

If your server is responsing slow on initial login, check the dns-cashe.

1. Check what version you are running.
# ssh -V

2. Check LookupClientHostnames entry on sshd_config
# grep LookupClientHostnames /etc/ssh/sshd_config
if no value return, add the entry to config file.
# echo “LookupClientHostnames no” >> /etc/ssh/sshd_config

Also check the entry for GSSAPIAuthentication on ssh config file.
# grep -i GSSAPIAuthentication /etc/ssh/sshd_config

if no value returned, add entry,
# echo “GSSAPIAuthentication no” >> /etc/ssh/sshd_config

3. Now, restart the ssh service
# svcs -a | grep ssh
# svcadm restart ssh

or

if this still does give you trouble, restart the dns cache
# svcs -a | grep cache
online         Jul_23   svc:/system/name-service-cache:default
# svcadm restart svc:/system/name-service-cache:default



Thursday, September 6, 2018

Solaris 10 - cron job didn't run at scheduled time.

Solaris 10 - cron job didn't run at scheduled time.

Checking log shows queue max run limit reached

# cat /var/cron/log
......
The problem: queue max limit reached
Solution: Restart the con service

1. Find the cron process
# ps -ef | grep cron
# ptree <PID>
# kill -9 <PID>

2 Restart cron service
# svcadm restart svc:/system/cron

3. raise the limit
# vi /etc/cron.d/queuedefs

if you still have issue, review queuedefs man page.

# more /etc/cron.d/queuedefs
a.4j1n
b.2j2n90w

# man queuedefs

       This file specifies that the a queue, for at jobs, can  have  up  to  4
       jobs  running  simultaneously; those jobs will be run with a nice value
       of 1.  As no nwait value was given, if a job cannot be run because  too
       many  other  jobs  are  running cron will wait 60 seconds before trying
       again to run it.

       The b queue, for batch(1) jobs, can have up to 2 jobs running  simulta-
       neously;  those  jobs  will be run with a nice(1) value of 2.  If a job
       cannot be run because too many other jobs are  running,  cron(1M)  will
       wait  90  seconds  before  trying again to run it. All other queues can
       have up to 100 jobs running simultaneously; they will  be  run  with  a
       nice value of 2, and if a job cannot be run because too many other jobs
       are running cron will wait 60 seconds before trying again to run it.

Solaris 11 - zfs - Resolving a Removed LUN from the solaris 11 server


zfs - Resolving a Removed LUN from the solaris 11 system.

SAN team accidently removed active used LUN and the pool went to suspended mode. We have to present the same LUN to the same host in order to fix the issue.

The guy presented the same LUN back to the host.

root@srdf-mg-p1:~# echo | format | grep -i 0419
      34. c3t6000D310006D64000000000000000419d0 <COMPELNT-Compellent Vol-0607-350.00GB>
          /scsi_vhci/ssd@g6000d310006d64000000000000000419


root@srdf-mg-p1:~# zpool export srdfmgp05-datacopy
cannot export 'srdfmgp05-datacopy': pool I/O is currently suspended
root@srdf-mg-p1:~# zpool status srdfmgp05-datacopy
  pool: srdfmgp05-datacopy
 state: SUSPENDED
status: One or more devices are unavailable in response to IO failures.
        The pool is suspended.
action: Make sure the affected devices are connected, then run 'zpool clear' or
        'fmadm repaired'.
        Run 'zpool status -v' to see device specific details.
   see: http://support.oracle.com/msg/ZFS-8000-HC
  scan: none requested
config:

        NAME                                     STATE     READ WRITE CKSUM
        srdfmgp05-datacopy                       SUSPENDED     2     0     0
          c3t6000D310006D64000000000000000419d0  ONLINE       0     0     0


root@srdf-mg-p1:~# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 28 11:54:35 8728f18f-ec4f-4e65-bd0e-cd8dab33617d  ZFS-8000-8A    Critical

Problem Status    : open
Diag Engine       : zfs-diagnosis / 1.0
System
    Manufacturer  : Oracle Corporation
    Name          : SPARC T7-1
    Part_Number   : 34863714+1+1
    Serial_Number : AK00397012
    Host_ID       : 86c0ac2a
    Server_Name           : srdf-mg-p1

----------------------------------------
Suspect 1 of 1 :
   Problem class : fault.fs.zfs.object.corrupt_data
   Certainty   : 100%
   Affects     : zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy
   Status      : faulted and taken out of service

   FRU
     Status           : faulty
     FMRI             : "zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy"

Description : A file or directory in pool 'srdfmgp05-datacopy' could not be
              read due to corrupt data.

Response    : No automated response will occur.

Impact      : The file or directory is unavailable.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
              Run 'zpool status -xv' and examine the list of damaged files to
              determine what has been affected. Please refer to the associated
              reference document at http://support.oracle.com/msg/ZFS-8000-8A
              for the latest service procedures and policies regarding this
              diagnosis.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 28 11:54:35 3cd0778a-78dc-499d-a5f1-98c3dac9c7a4  ZFS-8000-HC    Major

Problem Status    : open
Diag Engine       : zfs-diagnosis / 1.0
System
    Manufacturer  : Oracle Corporation
    Name          : SPARC T7-1
    Part_Number   : 34863714+1+1
    Serial_Number : AK00397012
    Host_ID       : 86c0ac2a
    Server_Name           : srdf-mg-p1

----------------------------------------
Suspect 1 of 1 :
   Problem class : fault.fs.zfs.io_failure_wait
   Certainty   : 100%
   Affects     : zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy
   Status      : faulted and taken out of service

   FRU
     Status           : faulty
     FMRI             : "zfs://pool=d2cd89a0a10856b8/pool_name=srdfmgp05-datacopy"

Description : ZFS pool 'srdfmgp05-datacopy' has experienced currently
              unrecoverable I/O failures.

Response    : No automated response will occur.

Impact      : Read and write I/Os cannot be serviced.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
              Make sure the affected devices are connected, then run 'zpool
              clear'. Please refer to the associated reference document at
              http://support.oracle.com/msg/ZFS-8000-HC for the latest service
              procedures and policies regarding this diagnosis.



root@srdf-mg-p1:~# zpool list
NAME                 SIZE  ALLOC   FREE  CAP  DEDUP     HEALTH  ALTROOT
rpool                556G   159G   397G  28%  1.00x     ONLINE  -
srdfmgp01-csf       99.5G  23.9G  75.6G  24%  1.00x     ONLINE  -
srdfmgp05-data       199G   165G  34.4G  82%  1.00x     ONLINE  -
srdfmgp05-datacopy   348G  75.1G   273G  21%  1.00x  SUSPENDED  -
root@srdf-mg-p1:~# zpool clear srdfmgp05-datacopy
root@srdf-mg-p1:~# zpool list
NAME                 SIZE  ALLOC   FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool                556G   159G   397G  28%  1.00x  ONLINE  -
srdfmgp01-csf       99.5G  23.9G  75.6G  24%  1.00x  ONLINE  -
srdfmgp05-data       199G   165G  34.4G  82%  1.00x  ONLINE  -
srdfmgp05-datacopy   348G  75.1G   273G  21%  1.00x  ONLINE  -
root@srdf-mg-p1:~#
root@srdf-mg-p1:/datacopy# df -h /datacopy
Filesystem             Size   Used  Available Capacity  Mounted on
srdf-mg-p1-datacopy/FS_DATACOPY           343G    75G       267G    22%    /datacopy

root@srdf-mg-p1:~# cd /datacopy/
root@srdf-mg-p1:/datacopy# ls
12.1.0_clnp.tar  PROD