我想知道ASM管理的磁盘header信息丢失的时候,如何数据恢复。
我试着用事先备份的磁盘头文件dd回来,VOL2和VOL4磁盘头信息是对了,但是ASM还是无法加载ORCL_DATA1磁盘组,执行alter diskgroup ORCL_DATA1 mount报错:
ORA-15032: not all alterations performed
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "ORCL_DATA1"昨天出现的问题,oracle10g RAC,两个节点,可能是ASM的bug导致的数据库无法访问VOL2,VOL4分区,进而磁盘组ORCL_DATA1无法挂载。
alert日志显示:
Tue Dec 30 18:11:00 2008
Reread from mirror side 'VOL2' returns corrupted data
Reread from mirror side 'VOL4' returns corrupted data
用kfed工具发现这两个分区磁盘头信息已经损坏。
恢复磁盘头时,我orcl1上执行的
#dd if=/u01/app/oracle/asmdiskheader/VOL2 of=/dev/oracleasm/disks/VOL2 bs=4096 count=1
#dd if=/u01/app/oracle/asmdiskheader/VOL4 of=/dev/oracleasm/disks/VOL4 bs=4096 count=1
我以为这样就可以了,然后$export ORACLE_SID=+ASM1;sqlplus / as sysdba,启动磁盘组ORCL_DATA1没成功。我把两台机器同时重启也不行,重启之后的一些sql查询如下:
SQL> select group_number,disk_number,path,STATE,REDUNDANCY,TOTAL_MB,FREE_MB,NAME,FAILGROUP from v$asm_disk;GROUP_NUMBER DISK_NUMBER PATH                                                                                                                                                        STATE     REDUNDANCY       TOTAL_MB    FREE_MB NAME                                                         FAILGROUP
------------ ----------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------- -------------- ---------- ---------- ------------------------------------------------------------ ------------------------------------------------------------
           0           0 ORCL:VOL2                                                                                                                                                   NORMAL    UNKNOWN           6588415          0
           0           1 ORCL:VOL4                                                                                                                                                   NORMAL    UNKNOWN           6588415          0
           1           0 ORCL:VOL1                                                                                                                                                   NORMAL    UNKNOWN           1021952    1021597 VOL1                                                         VOL1
           1           1 ORCL:VOL3                                                                                                                                                   NORMAL    UNKNOWN           1023999    1023644 VOL3                                                         VOL3SQL> select group_number , name , state , type , offline_disks from v$asm_diskgroup;GROUP_NUMBER NAME                                                         STATE                  TYPE         OFFLINE_DISKS
------------ ------------------------------------------------------------ ---------------------- ------------ -------------
           1 FLASH_RECOVERY_AREA                                          MOUNTED                NORMAL                   0
           0 ORCL_DATA1                                                   DISMOUNTED                                      0

解决方案 »

  1.   

    ASM的alert日志如下:
    ---------------------------------------------------
    Wed Dec 31 14:00:00 2008
    Instance terminated by LMON, pid = 30099
    Wed Dec 31 14:10:28 2008
    Starting ORACLE instance (normal)
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Interface type 1 eth1 172.16.0.0 configured from OCR for use as a cluster interconnect
    Interface type 1 eth0 10.134.64.0 configured from OCR for use as  a public interface
    Picked latch-free SCN scheme 1
    Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/10.2.0/db_1/dbs/arch
    Autotune of undo retention is turned off. 
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    ksdpec: called for event 13740 prior to event group initialization
    Starting up ORACLE RDBMS Version: 10.2.0.1.0.
    System parameters with non-default values:
      large_pool_size          = 12582912
      spfile                   = /u02/oradata/orcl/dbs/spfile+ASM.ora
      instance_type            = asm
      cluster_database         = TRUE
      instance_number          = 2
      remote_login_passwordfile= EXCLUSIVE
      background_dump_dest     = /u01/app/oracle/admin/+ASM/bdump
      user_dump_dest           = /u01/app/oracle/admin/+ASM/udump
      core_dump_dest           = /u01/app/oracle/admin/+ASM/cdump
      asm_diskgroups           = ORCL_DATA1, FLASH_RECOVERY_AREA
    Cluster communication is configured to use the following interface(s) for this instance
      172.16.0.2
    Wed Dec 31 14:10:28 2008
    cluster interconnect IPC version:Oracle UDP/IP
    IPC Vendor 1 proto 2
    PMON started with pid=2, OS id=8456
    DIAG started with pid=3, OS id=8458
    PSP0 started with pid=4, OS id=8460
    LMON started with pid=5, OS id=8462
    LMD0 started with pid=6, OS id=8464
    LMS0 started with pid=7, OS id=8470
    MMAN started with pid=8, OS id=8495
    DBW0 started with pid=9, OS id=8497
    LGWR started with pid=10, OS id=8499
    CKPT started with pid=11, OS id=8501
    SMON started with pid=12, OS id=8503
    RBAL started with pid=13, OS id=8505
    GMON started with pid=14, OS id=8507
    Wed Dec 31 14:10:29 2008
    lmon registered with NM - instance id 2 (internal mem no 1)
    Wed Dec 31 14:10:29 2008
    Reconfiguration started (old inc 0, new inc 1)
    ASM instance 
    pseudo shared rm latch used 
    List of nodes:
    1
    Global Resource Directory frozen
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    Wed Dec 31 14:10:29 2008
    LMS 0: 0 GCS shadows cancelled, 0 closed
    Set master node info 
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Post SMON to start 1st pass IR
    Wed Dec 31 14:10:29 2008
    LMS 0: 0 GCS shadows traversed, 0 replayed
    Wed Dec 31 14:10:29 2008
    Submitted all GCS remote-cache requests
    Post SMON to start 1st pass IR
    Fix write in gcs resources
    Reconfiguration complete
    LCK0 started with pid=15, OS id=8509
    Wed Dec 31 14:10:30 2008
    SQL> ALTER DISKGROUP ALL MOUNT 
    Wed Dec 31 14:10:30 2008
    NOTE: cache registered group FLASH_RECOVERY_AREA number=1 incarn=0x998878d3
    NOTE: cache registered group ORCL_DATA1 number=2 incarn=0x998878d4
    Wed Dec 31 14:10:30 2008
    Loaded ASM Library - Generic Linux, version 2.0.2 (KABI_V2) library for asmlib interface
    Wed Dec 31 14:10:30 2008
    NOTE: Hbeat: instance first (grp 1)
    ERROR: no PST quorum in group 2: required 2, found 0
    Wed Dec 31 14:10:30 2008
    NOTE: cache dismounting group 2/0x998878D4 (ORCL_DATA1) 
    NOTE: dbwr not being msg'd to dismount
    ERROR: diskgroup ORCL_DATA1 was not mounted
    Wed Dec 31 14:10:32 2008
    Reconfiguration started (old inc 1, new inc 2)
    List of nodes:
    0 1
    Global Resource Directory frozen
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    Wed Dec 31 14:10:33 2008
    LMS 0: 0 GCS shadows cancelled, 0 closed
    Set master node info 
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Wed Dec 31 14:10:33 2008
    LMS 0: 0 GCS shadows traversed, 0 replayed
    Wed Dec 31 14:10:33 2008
    Submitted all GCS remote-cache requests
    Fix write in gcs resources
    Reconfiguration complete
    Wed Dec 31 14:10:35 2008
    NOTE: start heartbeating (grp 1)
    NOTE: cache opening disk 0 of grp 1: VOL1 label:VOL1
    Wed Dec 31 14:10:35 2008
    NOTE: F1X0 found on disk 0 fcn 0.0
    NOTE: cache opening disk 1 of grp 1: VOL3 label:VOL3
    NOTE: F1X0 found on disk 1 fcn 0.0
    NOTE: cache mounting (first) group 1/0x998878D3 (FLASH_RECOVERY_AREA)
    * allocate domain 1, invalid = TRUE 
    kjbdomatt send to node 0
    Wed Dec 31 14:10:35 2008
    NOTE: attached to recovery domain 1
    Wed Dec 31 14:10:35 2008
    NOTE: starting recovery of thread=1 ckpt=11.139
    NOTE: starting recovery of thread=2 ckpt=13.41
    NOTE: advancing ckpt for thread=1 ckpt=11.139
    NOTE: advancing ckpt for thread=2 ckpt=13.41
    NOTE: cache recovered group 1 to fcn 0.2579
    Wed Dec 31 14:10:35 2008
    NOTE: opening chunk 1 at fcn 0.2579 ABA 
    NOTE: seq=12 blk=140 
    Wed Dec 31 14:10:35 2008
    NOTE: cache mounting group 1/0x998878D3 (FLASH_RECOVERY_AREA) succeeded
    SUCCESS: diskgroup FLASH_RECOVERY_AREA was mounted
    Wed Dec 31 14:10:44 2008
    NOTE: recovering COD for group 1/0x998878d3 (FLASH_RECOVERY_AREA)
    SUCCESS: completed COD recovery for group 1/0x998878d3 (FLASH_RECOVERY_AREA)
    Wed Dec 31 14:27:34 2008
    SQL> alter diskgroup ORCL_DATA1 mount 
    Wed Dec 31 14:27:34 2008
    NOTE: cache registered group ORCL_DATA1 number=2 incarn=0x9b1878dc
    Wed Dec 31 14:27:34 2008
    ERROR: no PST quorum in group 2: required 2, found 0
    Wed Dec 31 14:27:34 2008
    NOTE: cache dismounting group 2/0x9B1878DC (ORCL_DATA1)
      

  2.   

    有人对ASM加载diskgroup比较了解吗?交流一下
      

  3.   

    select path, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE from v$asm_disk;
    输出是啥?
    把path 作kfed看看什么情况.
      

  4.   

    比如:kfed read /dev/asm1_disk1 aunum=0 blknum=1|more
      

  5.   

    SQL> select path, MOUNT_STATUS, HEADER_STATUS, MODE_STATUS, STATE from v$asm_disk;
     
    PATH                                                                                                                                                                                MOUNT_STATUS    HEADER_STATUS            MODE_STATUS    STATE
    -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- ------------------------ -------------- ----------------
    ORCL:VOL2                                                                                                                                                                           CLOSED  MEMBER                   ONLINE         NORMAL
    ORCL:VOL4                                                                                                                                                                           CLOSED  MEMBER                   ONLINE         NORMAL
    ORCL:VOL1                                                                                                                                                                           CACHED  MEMBER                   ONLINE         NORMAL
    ORCL:VOL3                                                                                                                                                                           CACHED  MEMBER                   ONLINE         NORMAL
    ==============================================================================================================================
    [oracle@bjljcsev-10 lib]$ kfed read /dev/oracleasm/disks/VOL2  aunum=0 blknum=1|more
    kfbh.endian:                          0 ; 0x000: 0x00
    kfbh.hard:                            0 ; 0x001: 0x00
    kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
    kfbh.datfmt:                          0 ; 0x003: 0x00
    kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
    kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
    kfbh.check:                           0 ; 0x00c: 0x00000000
    kfbh.fcn.base:                        0 ; 0x010: 0x00000000
    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
    kfbh.spare1:                          0 ; 0x018: 0x00000000
    kfbh.spare2:                          0 ; 0x01c: 0x00000000
    [oracle@bjljcsev-10 lib]$ kfed read /dev/oracleasm/disks/VOL4  aunum=0 blknum=1|more
    kfbh.endian:                          0 ; 0x000: 0x00
    kfbh.hard:                            0 ; 0x001: 0x00
    kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
    kfbh.datfmt:                          0 ; 0x003: 0x00
    kfbh.block.blk:                       0 ; 0x004: T=0 NUMB=0x0
    kfbh.block.obj:                       0 ; 0x008: TYPE=0x0 NUMB=0x0
    kfbh.check:                           0 ; 0x00c: 0x00000000
    kfbh.fcn.base:                        0 ; 0x010: 0x00000000
    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
    kfbh.spare1:                          0 ; 0x018: 0x00000000
    kfbh.spare2:                          0 ; 0x01c: 0x00000000
    [oracle@bjljcsev-10 lib]$ kfed read /dev/oracleasm/disks/VOL1  aunum=0 blknum=1|more
    kfbh.endian:                          1 ; 0x000: 0x01
    kfbh.hard:                          130 ; 0x001: 0x82
    kfbh.type:                            2 ; 0x002: KFBTYP_FREESPC
    kfbh.datfmt:                          1 ; 0x003: 0x01
    kfbh.block.blk:                       1 ; 0x004: T=0 NUMB=0x1
    kfbh.block.obj:              2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
    kfbh.check:                  2180809470 ; 0x00c: 0x81fc82fe
    kfbh.fcn.base:                        0 ; 0x010: 0x00000000
    kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
    kfbh.spare1:                          0 ; 0x018: 0x00000000
    kfbh.spare2:                          0 ; 0x01c: 0x00000000
    kfdfsb.aunum:                         0 ; 0x000: 0x00000000
    kfdfsb.max:                         254 ; 0x004: 0x00fe
    kfdfsb.cnt:                         254 ; 0x006: 0x00fe
    kfdfse[0].total:                    448 ; 0x008: 0x01c0
    kfdfse[0].free:                       1 ; 0x00a: 0x01
    kfdfse[0].frag:                       1 ; 0x00b: 0x01
    kfdfse[1].total:                    448 ; 0x00c: 0x01c0
    kfdfse[1].free:                       1 ; 0x00e: 0x01
    kfdfse[1].frag:                       1 ; 0x00f: 0x01
    kfdfse[2].total:                    448 ; 0x010: 0x01c0
    kfdfse[2].free:                       1 ; 0x012: 0x01
    kfdfse[2].frag:                       1 ; 0x013: 0x01
    kfdfse[3].total:                    448 ; 0x014: 0x01c0
    kfdfse[3].free:                       1 ; 0x016: 0x01
    kfdfse[3].frag:                       1 ; 0x017: 0x01
    kfdfse[4].total:                    448 ; 0x018: 0x01c0
    kfdfse[4].free:                       1 ; 0x01a: 0x01
    kfdfse[4].frag:                       1 ; 0x01b: 0x01
    kfdfse[5].total:                    448 ; 0x01c: 0x01c0
    kfdfse[5].free:                       1 ; 0x01e: 0x01
    kfdfse[5].frag:                       1 ; 0x01f: 0x01
    kfdfse[6].total:                    448 ; 0x020: 0x01c0
    kfdfse[6].free:                       1 ; 0x022: 0x01
    kfdfse[6].frag:                       1 ; 0x023: 0x01
    --More--
    ==========================================================
    看来我用dd恢复磁盘header没起到什么作用,有什么方法解决啊,谢谢!
      

  6.   

    是不是说我做的dd备份不对啊?我当时只是dd if=/dev/oracleasm/disks/VOL1 of=/u01/app/oracle/asmdiskheader/VOL1 bs=4096 count=1
    至少要拷贝多大的空间呢?
      

  7.   

    已经corrupted.dd if=<device> bs=4096 count=1 |od -x输出?
      

  8.   

    如果不出所料,你需要rebuild asm disk了.只不过不是你简单的dd, dd只是其中的一个过程而已.
      

  9.   

    备份asm disk header啊.如:dd if=/dev/raw/raw1 of=asmheader bs=4096 count=1 这样当header出现损坏的时候 可以这样来恢复dd if=asmheader of=/dev/raw/raw1 ......
      

  10.   

    晕,我就是这么做的备份啊!但是实践证明,像我这样仅备份磁盘头4096是无法恢复ASM的磁盘组的。
    再说一下ORCL_DATA1磁盘组创建情况:oracleasm创建的逻辑卷VOL2(sdb3=6T)和VOL4(sdc2=6T),创建ASM实例时选择的是normal redundancy方式,由VOL2和VOL4组成。出错时,VOL2和VOL4的header同时丢失。
      

  11.   

    你确定是停掉所有asm实例了Z?
      

  12.   

    嗯?备份ASM header时需要停掉ASM实例吗?
      

  13.   

    恢复的时候我没停ASM实例,dd之后我试过重启,还是不行。
      

  14.   

    先把当前的作一次备份,然后停掉所有asm 实例.然后把当前的asm header  dd 一次清空掉再试试.