Wednesday, August 15, 2018

Exadata Disk Replacement

******************************************************************************
This document is specific to hard drives in "predictive failure" or "poor performance" state. There are situations where a drive
will be flagged at first as a predictive failure or poor performance but the drive will hard fail (go into "critical" status)
before the rebalance operation has completed.  In such cases, please reference Doc ID 1386147.1 for replacement steps.


WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:
----------------------------------------------------------------------------------

It is expected that the Exadata Machine is up and running and the storage cell containing the failed drive is booted and available.

If there are multiple drives to be replaced within an Exadata machine (or between an Exadata interconnected with another Exadata or Expansion Cabinet),
it is critical that only ONE DRIVE BE REPLACED AT A TIME to avoid the risk of data loss.  This is particularly important in the case of disks running
in predictive failure status. Before replacing another disk in Exadata, ensure the rebalance operation has completed from the first replacement.

Before proceeding, confirm the part number of the part in hand (either from logistics or an on-site spare) matches the part number dispatched
for replacement (especially important in cases where the customer has multiple racks of different drive types/sizes).

It is expected that the customer's DBA has completed these steps prior to arriving to replace the disk.
The following commands are provided as guidance in case the customer needs assistance checking the status of the system prior to replacement.
If the customer or the FSE requires more assistance prior to the physical replacement of the device, EEST/TSC should be contacted.


1. Confirm the drive needing replacement based on the output provided ("name" or "slotNumber" value) and LED status of drive. 
For a predictive failure, the LED for the failed drive should have the "Service Action Required" amber LED illuminated/flashing.
For example, follow Doc ID 1112995.1 to determine the failed drive.


CellCLI> LIST PHYSICALDISK
   
WHERE diskType=HardDisk AND status LIKE ".*failure.*"

DETAIL (Example only)
======
name: 20:3
deviceId: 19
diskType: HardDisk
enclosureDeviceId: 20
errMediaCount: 0
errOtherCount: 0
foreignState: false
luns: 0_3
makeModel: "SEAGATE ST360057SSUN600G"
physicalFirmware: 0805
physicalInterface: sas
physicalSerial: E07L8E
physicalSize: 558.9109999993816G
slotNumber: 3
status: predictive failure


In the output above, both the "name:" value (following the ":") and the "slotNumber" provide the slot of the physical device requiring replacement
where the "status" field is "predictive failure" status.  In the above example, the slot is determined to be slot 3.  (slotNumber: 3 AND name: 20:3)



2. The Oracle ASM disks associated with the grid disks on the physical drive are automatically dropped, and an Oracle ASM rebalance will relocate the data
   from the predictively failed disk to other disks. This may take some time depending on the ASM rebalance power paramater setting, and how active the database is.
   Ensure the "status" of the disk is "predictive failure" AND that ASM has completed rebalancing before replacing the disk. If the rebalance is still in progress, then the disk is still in use for data and the replacement of the disk should wait until the rebalance has completed.
   To verify the disk is succesfully dropped from ASM, do the following:

a. Login to a database node with the username for the owner of Oracle Grid Infrastructure home. Typically this is the 'oracle' user.

dbmexdb01 login: oracle
Password:
Last login: Thu Jul 12 14:43:10 on ttyS0
[oracle@dbmexdb01 ~]$

b. Select the ASM instance for this DB node and connect to SQL Plus:

  [oracle@dbmexdb01 ~]$ . oraenv

  ORACLE_SID = [oracle] ? +ASM1

  The Oracle base has been set to /u01/app/oracle


[oracle@dbmexdb01 ~]$ sqlplus ' / as sysasm'
SQL*Plus: Release 11.2.0.2.0 Production on Thu Jul 12 14:45:20 2012
Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options


SQL>

In the above output the ldquo;1rdquo; of ldquo;+ASM1rdquo; refers to the DB node number. For example, DB node #3 the value would be +ASM3.


c. Run this ASM query to check if a rebalance is in progress:

SQL> select group_number,name,state from v$asm_diskgroup;

SQL> select * from gv$asm_operation where state='RUN';

If there are active rows returned, then a rebalance is in progress or it failed. Wait and re-run the query until such time as no rows are returned.
If there are no rebalance operations in progress, the result will be:
no rows selected.

If the ASM rebalance failed with an error check the output ofGV$ASM_OPERATION.ERROR. If this returns a value then contact the SR owner for further guidance.

3. Verify the grid disks from that physical disk are no longer part of the ASM diskgroups:

a. Login to the cell server and enter the CellCLI interface

dbmexcel01 login: celladmin
Password:

[celladmin@dbmexcel01 ~]$ cellcli
CellCLI: Release 11.2.2.4.2 - Production on Mon Jul 23 16:21:17 EDT 2012

Copyright (c) 2007, 2009, Oracle.  All rights reserved.
Cell Efficiency Ratio: 1,000

CellCLI>



b. Identify the name of the diskgroups used by that disk:

CellCLI> list celldisk where lun=0_3 detail
         name:                   CD_03_dbmexcel01
         comment:
         creationTime:           2012-05-18T11:41:53-04:00
         ...
         status:                 normal

CellCLI> list griddisk where celldisk=CD_03_dbmexcel01
         DATA_Q1_CD_03_dbmexcel01   active
         DBFS_DG_CD_03_dbmexcel01   active
         RECO_Q1_CD_03_dbmexcel01   active

CellCLI>



c. From the DB node, run the following ASM query:

SQL> set linesize 132

SQL> col path format a50

SQL> select group_number,path,header_status,mount_status,mode_status,name from V$ASM_DISK where path like '%CD_03_dbmexcel01';

GROUP_NUMBER PATH                                               HEADER_STATU MOUNT_S MODE_ST NAME
------------ -------------------------------------------------- ------------ ------- ------- ------------------------------
           0 o/192.168.9.9/DBFS_DG_CD_03_dbmexcel01              FORMER       CLOSED  ONLINE
           0 o/192.168.9.9/DATA_Q1_CD_03_dbmexcel01              FORMER       CLOSED  ONLINE
           0 o/192.168.9.9/RECO_Q1_CD_03_dbmexcel01              FORMER       CLOSED  ONLINE

SQL>
The group_number column should be '0',  and name field should be blank (or NULL).

If this is showing a different output including the name field being populated, then  grid disks are still part of the ASM diskgroup, then they need to dropped. 

To drop the grid disks:

  SQL>alter diskgroup dbfs_dg drop disk DBFS_DG_CD_03_dbmexcel01 rebalance power 4;

  SQL>alter diskgroup reco_q1 drop disk RECO_q1_cd_03_dbmexcel01 rebalance power 4;

  SQL>alter diskgroup data_q1 drop disk DATA_Q1_CD_03_dbmexcel01 rebalance power 4;

After dropping the disks, then repeat the above steps to check the rebalance status and wait for it to complete, then re-validate it has been dropped successfully.



4. The Cell Management Server daemon monitors and takes action on replacement disks to automatically bring the new disk into the configuration. Verify the status of the msStatus is running on the Storage cell before replacing the disk, using the cell's CellCLI interface:

CellCLI> list cell attributes cellsrvStatus,msStatus,rsStatus detail

cellsrvStatus: running
msStatus: running
rsStatus: running

5. If the predictive failed drive is one of the system boot drives (slots 0 or 1), then the disk is a system disk that contains the running OS.
Verify the root volume is in 'clean' state before hot replacing a system disk.
If it is 'active' and the disk is hot removed, the OS may crash making the recovery more difficult.


a. Login as 'root' on the Storage Cell, and use 'df' to determine the md device name for "/" volume.

[root@dbm1cel1 /]# df

Filesystem           1K-blocks      Used          Available       Use%   Mounted on
/dev/md5              10317752    2906660     6886980       30%             /
tmpfs                    12265720         0              12265720       0%        /dev/shm
/dev/md7              2063440       569452       1389172       30%      /opt/oracle
/dev/md4              118451         37567          74865           34%      /boot
/dev/md11            2395452      74228          2199540         4%      /var/log/oracle




b. Use 'mdadm' to determine the volume status:

[root@dbm1cel1 ~]# mdadm -Q --detail /dev/md5
/dev/md5:
        Version : 0.90
       Creation Time : Wed May 16 20:37:31 2012
       Raid Level : raid1
       Array Size : 10482304 (10.00 GiB 10.73 GB)
       Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
       Raid Devices : 2
       Total Devices : 3
       Preferred Minor : 5
       Persistence : Superblock is persistent

      Update Time : Mon Jul 23 16:46:37 2012
      State : clean
      Active Devices : 2
      Working Devices : 2
      Failed Devices : 1
     Spare Devices : 0

           UUID : 6de571d0:61eaac33:7050abfe:00bc6417
         Events : 0.348

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1      65      213        1      active sync   /dev/sdad5

       2       8       21        -      faulty spare


[root@dbmexcel03 ~]# mdadm -Q --detail /dev/md5
/dev/md5:
        Version : 0.90
       Creation Time : Thu Mar 17 23:19:42 2011
       Raid Level : raid1
       Array Size : 10482304 (10.00 GiB 10.73 GB)
       Used Dev Size : 10482304 (10.00 GiB 10.73 GB)
       Raid Devices : 2
       Total Devices : 2
       Preferred Minor : 5
       Persistence : Superblock is persistent

       Update Time : Wed Jul 18 11:53:34 2012
       State : clean
       Active Devices : 2
       Working Devices : 2
      Failed Devices : 0
      Spare Devices : 0

           UUID : e75c1b6a:64cce9e4:924527db:b6e45d21
           Events : 0.108

    Number   Major   Minor   RaidDevice State
       0       8        5        0      active sync   /dev/sda5
       1       8       21        1      active sync   /dev/sdb5



The Devices section may or may not show as 'failed' with an extra disk "2" showing as "faulty spare.
This is dependent on the state of the OS when the device went predicted failed. 
The most important aspect is whether the state is "clean" or "active". "clean" is safe to hot remove,
"active" is actively syncing the disk mirrors and should wait until it is "clean" before hot removing the disk.
If the disk is staying in "active" state, then follow the steps in MOS Note 1524329.1 to set it to removed before continuing.


No comments:

Post a Comment