Tip: 看不到本站引用 Flickr 的图片? 下载 Firefox Access Flickr 插件 | AD: 订阅 DBA notes --

2017-05-27 Sat

03:03 Automating Password Rotation for Oracle Databases (23785 Bytes) » Official Pythian Blog

Password rotation is not the most exciting task in the world, and that’s exactly why it’s a perfect candidate for automation. Automating routine tasks like this are good for everyone – DBAs can work on something that’s more exciting, companies save costs as less time is spent on changing the passwords, and there’s no place for human error, either. At Pythian, we typically use Ansible for task automation, and I like it mainly because of its non-intrusive configuration (no agents need to be installed on the target servers), and its scalability (tasks are executed in parallel on the target servers). This post will briefly describe how I automated password rotation for oracle database users using Ansible.


This blog post is not an intro to what is Ansible and how to use it, but it’s rather an example of how a simple task can be automated using Ansible in a way that’s scalable, flexible and easily reusable, and also provides the ability for other tasks to pick up the new passwords from a secure password store.

  • Scalability – I’d like to take advantage of Ansible’s ability of executing tasks on multiple servers at the same time. For example, in a large environments of tens or hundreds of machines, a solution that executes password change tasks serially would not be suitable. This would be an example of a “serial” task (it’s not a real thing, but just an illustration that it “hardcodes” a few “attributes” (environment file, the username and the hostname), and creating a separate task for every user/database you’d want to change the password for would be required:
    - hosts: ora-serv01
      remote_user: oracle
      - name: change password for SYS
        shell: | 
          . TEST1.env && \
          sqlplus / as sysdba @change_pasword.sql SYS \
          \"{{lookup('password','/dev/null length=8')}}\"
  • Flexible – I want to be able to adjust the list of users for which the passwords are changed, and the list of servers/databases that the user passwords are changed for in a simple way, that doesn’t include changing the main task list.
  • Reusable – this comes together with flexibility. The idea is that the playbook would be so generic, that it wouldn’t require any changes when it’s implemented in a completely separate environment (i.e. for another client of Pythian)
  • Secure password store – the new passwords are to be generated by the automated password rotation tool, and a method of storing password securely is required so that the new passwords could be picked up by the DBAs, application owners or the next automated task that would reconfigure the application

The implementation


I chose to do the implementation using Ansible 2.3, because it introduces the passwordstore lookup, which enables interaction with the pass utility (read more about it in Passwordstore.org). pass is very cool. It store passwords in gpg-encrypted files, and it can also be configured to automatically update the changes to a git repository, which relieves us of the headache of password distribution. The password can be retrieved from git on the servers that need the access to the new passwords.

Ansible 2.3 runs on python 2.6, unfortunately, the passwordstore lookup requires Python 2.7, which can be an issue if the control host for Ansible runs on Oracle Linux 6 or RHEL 6, as they don’t provide Python 2.7 in the official yum repositories. Still, there are ways of getting it done, and I’ll write another blog post about it.

So, what we’ll need is:

  • Ansible 2.3
  • jmespath plugin on Ansible control host (pip install jmespath)
  • jinja2 plugin on Ansible control host (I had to update it using pip install -U jinja2 in few cases)
  • Python 2.7 (or Python 3.5)
  • pass utility

The Playbook

This is the whole list of files that are included in the playbook:


Let’s take a quick look at all of them:

  • ./chpwd.yml – is the playbook and (in this case) it’s extremely simple as I want to run the password change against all defined hosts:
    $ cat ./chpwd.yml
      - name: password change automation
        hosts: all
          - db_users
  • ./inventory/hosts, ./inventory/orcl1-vagrant-private_key, ./inventory/orcl2-vagrant-private_key – these files define the hosts and the connectivity. In this case we have 2 hosts – orcl1 and orcl2, and we’ll connect to vagrant user using the private keys.
    $ cat ./inventory/hosts
    orcl1 ansible_host= ansible_port=2201 ansible_ssh_private_key_file=inventory/orcl1-vagrant-private_key ansible_user=vagrant
    orcl2 ansible_host= ansible_port=2202 ansible_ssh_private_key_file=inventory/orcl2-vagrant-private_key ansible_user=vagrant
  • ./roles/db_users/files/change_password.sql – A sql script that I’ll execute on the database to change the passwords. It takes 2 parameters the username and the password:
    $ cat ./roles/db_users/files/change_password.sql
    set ver off pages 0
    alter user &1 identified by "&2";
  • ./roles/db_users/files/exists_user.sql – A sql script that allows verifying the existence of the users. It takes 1 argument – the username. It outputs “User exists.” when the user is there, and “User {username} does not exist.” – when it’s not.
    $ cat ./roles/db_users/files/exists_user.sql
    set ver off pages 0
    select 'User exists.' from all_users where username=upper('&1')
    union all
    select 'User '||upper('&1')||' does not exist.' from (select upper('&1') from dual minus select username from all_users);
  • ./roles/db_users/defaults/main.yml – is the default file for the db_users role. I use this file to define the users for each host and database for which the passwords need to be changed:
    $ cat ./roles/db_users/defaults/main.yml
        - name: TEST1
          host: orcl1
          env: ". ~/.bash_profile && . ~/TEST1.env > /dev/null"
          pwdstore: "orcl1/TEST1/"
          os_user: oracle
          become_os_user: yes
            - dbsnmp
            - system
        - name: TEST2
          host: orcl2
          env: ". ~/.bash_profile && . ~/TEST2.env > /dev/null"
          pwdstore: "orcl2/TEST2/"
          os_user: oracle
          become_os_user: yes
            - sys
            - system
            - ctxsys
        - name: TEST3
          host: orcl2
          env: ". ~/.bash_profile && . ~/TEST3.env > /dev/null"
          pwdstore: "orcl2/TEST3/"
          os_user: oracle
          become_os_user: yes
            - dbsnmp

    In this data structure, we define everything that’s needed to be known to connect to the database and change the passwords. each entry to the list contains the following data:

    • name – just a descriptive name of the entry in this list, normally it would be the name of the database that’s described below.
    • host – the host on which the database resides. It should match one of the hosts defined in ./inventory/hosts.
    • env – how to set the correct environment to be able to connect to the DB (currently it requires sysdba connectivity).
    • pwdstore – the path to the folder in the passwordstore where the new passwords will be stored.
    • os_user and become_os_user – these are used in case sudo to another user on the target host is required. In a typical configuration, I connect to the target host using a dedicated user for ansible, and then sudo to the DB owner. if ansible connects to the DB onwer directly, then become_os_user should be set to “no”.
    • users – this is the list of all users for which the passwords need to be changed.

    As you see, this structure greatly enhances the flexibility and reusability, because adding new databases, hosts or users to the list would be done by a simple change to the “db_users:” structure in this defaults file. In this example, dbsnmp and system passwords are rotated for TEST1@orcl1, sys, system and ctxsys passwords are rotated for TEST2@orcl2, and dbsnmp on TEST3@orcl2

  • ./roles/db_users/tasks/main.yml – this is the task file of the db_users role. The soul of the playbook and the main part that does the password change depending on the contents in the defaults file described above. Instead of pasting the whole at once, I’ll break it up task by task, and will provide some comments about what’s being done.
    • populate host_db_users – This task simply filters the whole db_users data structure that’s defined in the defaults file, and creates host_db_users fact with only the DBs that belong to the host the task is currently run on. Using the ansible “when” conditional would also be possible to filter the list, however in such case there’s a lot of “skipped” entries displayed when the task is executed, so I prefer filtering the list before it’s even passed to the Ansible task.
        - name: populate host_db_users
          set_fact: host_db_users="{{ db_users | selectattr('host','equalto',ansible_hostname) | list }}"
    • create directory for target on db hosts – for each unique combination of os_user and become_os_user on the target host, and “ansible” directly is created. A json_query is used here, to filter just the os_user and become_os_user attributes that are needed. It would also work with with_items: "{{ host_db_users }}", but in that case, the outputs become cluttered as the attributes are displayed during the execution.
        - name: create directory for target on db hosts
            path: "ansible"
            state: directory
          become_user: "{{ item.os_user }}"
          become: "{{ item.become_os_user }}"
          with_items: "{{ host_db_users | json_query('[*].{os_user: os_user, become_os_user: become_os_user }') | unique | list }}"
    • copy sql scripts to db_hosts – the missing scripts are copied from Ansible control host to the target “ansible” directories. “with_nested” is the method to create a loop in Ansible.
        - name: copy sql scripts to db_hosts
            src="{{ item[1] }}"
          become_user: "{{ item[0].os_user }}"
          become: "{{ item[0].become_os_user }}"
            - "{{ host_db_users | json_query('[*].{os_user: os_user, become_os_user: become_os_user }') | unique | list }}"
            - ['files/change_password.sql','files/exists_user.sql']
    • verify user existence – I’m using a shell module to execute the sql script after setting the environment. The outputs are collected in “exists_output” variable. This task will not fail and will not show as “changed” because of failed_when and changed_when settings of “false”.
        - name: verify user existence
          shell: |
             {{ item[0].env }} && \
             sqlplus -S / as sysdba \
             @ansible/exists_user.sql {{ item[1] }}
          register: exists_output
          become_user: "{{ item[0].os_user }}"
          become: "{{ item[0].become_os_user }}"
            - "{{ host_db_users |json_query('[*].{env: env, os_user: os_user, users: users, become_os_user: become_os_user }') }}"
            - users
          failed_when: false
          changed_when: false
    • User existence results – this task will fail when any of the users didn’t exist, and will display which user it was. This is done in a separate task to produce cleaner output, and in case it’s not wanted to fail if any of the users don’t exist (continue to change passwords for the existing users), this task can simply be commented or the “failed_when: false” can be uncommented.
        - name: User existence results
          fail: msg="{{ item }}"
          with_items: "{{ exists_output.results|rejectattr('stdout','equalto','User exists.')|map(attribute='stdout')|list }}"
          #failed_when: false
    • generate and change the user passwords – finally, this is the task that actually changes the passwords. The successful password change is detected by checking the output from the sqlscript, which should produce “User altered.” The rather complex use of lookups is there for a reason: the passwordstore lookup can also generate passwords, but it’s not possible to define the character classes that the new password should contain, however the “password” lookup allows defining these. Additionally, the 1st character is generated only containing “ascii_letters”, as there are usually some applications that “don’t like” passwords that start with numbers (this is why generating the 1st letter of the password is separated from the remaining 11 characters. And lastly, the “passwordstore” lookup is used with the “userpass=” parameter to pass and store the generated password into the passwordstore (and it also keeps the previous passwords). This part could use some improvement as in some cases different rules for the generated password complexity may be required. The password change outputs are recorded in “change_output” that’s checked in the last task.
        - name: generate and change the user passwords
          shell: |
             {{ item[0].env }} && \
             sqlplus -S / as sysdba \
             @ansible/change_password.sql \
             {{ item[1] }} \"{{ lookup('passwordstore',item[0].pwdstore + item[1] + ' create=true overwrite=true userpass=' +
                                       lookup('password','/dev/null chars=ascii_letters length=1') +
                                       lookup('password','/dev/null chars=ascii_letters,digits,hexdigits length=11')) }}\"
          register: change_output
          become_user: "{{ item[0].os_user }}"
          become: "{{ item[0].become_os_user }}"
            - "{{ host_db_users |json_query('[*].{env: env, os_user: os_user, users: users, pwdstore: pwdstore, become_os_user: become_os_user}') }}"
            - users
          failed_when: false
          changed_when: "'User altered.' in change_output.stdout"
    • Password change errors – The “change_output” data are verified here, and failed password changes are reported.
         # fail if the password change failed.
        - name: Password change errors
          fail: msg="{{ item }}"
          with_items: "{{ change_output.results|rejectattr('stdout','equalto','\nUser altered.')|map(attribute='stdout')|list }}"

It really works!

Now, when you know how it’s built – it’s time to show how it works!
Please pay attention to the following:

  • The password store is empty at first
  • The whole password change playbook completes in 12 seconds
  • The tasks on both hosts are executed in parallel (see the order of execution feedback for each task)
  • The passwordstore contains the password entries after the playbook completes, and they can be retrieved by using the pass command
$ pass
Password Store

$ time ansible-playbook -i inventory/hosts chpwd.yml

PLAY [pasword change automation] *******************************************************

TASK [Gathering Facts] *****************************************************************
ok: [orcl1]
ok: [orcl2]

TASK [db_users : populate host_db_users] ***********************************************
ok: [orcl1]
ok: [orcl2]

TASK [db_users : create directory for target on db hosts] ******************************
changed: [orcl1] => (item={'become_os_user': True, 'os_user': u'oracle'})
changed: [orcl2] => (item={'become_os_user': True, 'os_user': u'oracle'})

TASK [db_users : copy sql scripts to db_hosts] *****************************************
changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql'])
changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/change_password.sql'])
changed: [orcl1] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql'])
changed: [orcl2] => (item=[{'become_os_user': True, 'os_user': u'oracle'}, u'files/exists_user.sql'])

TASK [db_users : verify user existance] ************************************************
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys'))
ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp'))
ok: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system'))
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system'))
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys'))
ok: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp'))

TASK [db_users : User existance results] ***********************************************

TASK [db_users : generate and change the user passwords] *******************************
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'sys'))
changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'dbsnmp'))
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'system'))
changed: [orcl1] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl1/TEST1/', 'env': u'. ~/.bash_profile && . ~/TEST1.env > /dev/null'}, u'system'))
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST2/', 'env': u'. ~/.bash_profile && . ~/TEST2.env > /dev/null'}, u'ctxsys'))
changed: [orcl2] => (item=({'become_os_user': True, 'os_user': u'oracle', 'pwdstore': u'orcl2/TEST3/', 'env': u'. ~/.bash_profile && . ~/TEST3.env > /dev/null'}, u'dbsnmp'))

TASK [db_users : Password change errors] ***********************************************

PLAY RECAP *****************************************************************************
orcl1                      : ok=6    changed=3    unreachable=0    failed=0
orcl2                      : ok=6    changed=3    unreachable=0    failed=0

real    0m12.418s
user    0m8.590s
sys     0m3.900s

$ pass
Password Store
|-- orcl1
|   |-- TEST1
|       |-- dbsnmp
|       |-- system
|-- orcl2
    |-- TEST2
    |   |-- ctxsys
    |   |-- sys
    |   |-- system
    |-- TEST3
        |-- dbsnmp

$ pass orcl1/TEST1/system
lookup_pass: First generated by ansible on 26/05/2017 14:28:50


For past 2 months I’ve been learning Ansible and trying it for various DBA tasks. It hasn’t always been a smooth ride, as I had to learn quite a lot, because I wasn’t exposed much to beasts like jinja2, json_query, YAML, python (very handy for troubleshooting) and Ansible itself before. I feel that my former PL/SQL coder’s experience had created some expectations from Ansible, that turned out not to be true. The biggest challenges to me were getting used to the linear execution of the playbook (while with PL/SQL I can call packages, functions, etc. to process the data “outside” the main linear code line), and the lack of execution feedback, because one has to learn creating Ansible tasks in a way that they either succeed or fail (no middle states like ‘this is a special case – process it differently’), as well as the amount of visual output is close to none – which does make sense to some degree, it’s “automation” after all, right? Nobody should be watching :)
A separate struggle for me was working with the complex data structure that I created for storing the host/database/user information. It’s a mix of yaml “dictionary” and “list”, and it turned out to be difficult to process it in a way I wanted – this is why I used the json_query at times (although not in a very complex way in this case). There are probably simpler ways I didn’t know of (didn’t manage finding), and I’d be glad if you’d let me know of possible improvements or even other approaches to such tasks that you have worked on and implemented.
Despite all the complaining above, I think it’s really worth investing time in automating tasks like this, it really works and once done it doesn’t require much attention. Happy Automating!

2017-05-26 Fri

04:04 Orphaned Disks in OVM and What to Do with Them (11706 Bytes) » Official Pythian Blog

Some time ago I was doing a maintenance on an OVM and noticed that it had significant number of disks without mapping to any virtual machine (I need to mention that the OVM cluster was a home for more than 400 VMs). Having about 1800 virtual disks it was easy to miss some lost disks without any mapping to VMs. Some of them were created on purpose and were possibly forgotten but the most looked like leftovers from an automatic deployment. I attached several of the disks to a test VM and checked the contents:

[root@vm129-132 ~]# fdisk -l /dev/xvdd

Disk /dev/xvdd: 3117 MB, 3117416448 bytes, 6088704 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[root@vm129-132 ~]# dd if=/dev/xvdd bs=512 count=100 | strings 
100+0 records in
100+0 records out
51200 bytes (51 kB) copied, 0.00220462 s, 23.2 MB/s
[root@vm129-132 ~]#

And checked other attributes for the disks from OVM cli:

OVM> show VirtualDisk id=0004fb0000120000d5e0235900f63355.img
Command: show VirtualDisk id=0004fb0000120000d5e0235900f63355.img
Status: Success
Time: 2017-05-19 09:11:26,664 PDT
  Absolute Path = nfsserv:/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000d5e0235900f63355.img
  Mounted Path = /OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000d5e0235900f63355.img
  Max (GiB) = 2.9
  Used (GiB) = 0.0
  Shareable = Yes
  Repository Id = 0004fb0000030000998d2e73e5ec136a  [crepo1]
  Id = 0004fb0000120000d5e0235900f63355.img  [6F4dKi9hT0cYW_db_asm_disk_0 (21)]
  Name = 6F4dKi9hT0cYW_db_asm_disk_0 (21)
  Locked = false
  DeprecatedAttrs = [Assembly Virtual Disk]

The disk was completely empty and, according to the name and one of the deprecated attributes, it was clear that the disk was a leftover from a deployed assembly. I remembered one issue in the past when shared disks were not deleted if you were using one of assemblies for Oracle RAC deployed and deleted through Oracle Enterprise Manager Self Service Portal (OEM SS). It was noticed on OVM 3.2.x with OEM 12c. In that case, if you had two or more VMs working with the same shared disks those shared disks were not deleted when all VMs and local disks had been destroyed. The issue has been gone for long time but the lost disks were left behind.

I created a script to find all the disks without a mapping to any existing VM. The script was written using expect language and ssh cli for OVM. To run the script you need connection to OVM manager using ssh to port 10000 and expect language working on your machine. I used one of the oracle sample scripts to build my own.
Here is the script body:


set username [lindex $argv 0];
set password [lindex $argv 1];
set prompt "OVM> "

set timeout 3
log_user 0

spawn ssh -l $username -p 10000
expect_after eof {exit 0}

##interact with SSH
expect "yes/no" {send "yes\r"}
expect "password:" {send "$password\r"}

#################### Execute Command passed in ##################
expect "OVM> "
set timeout 20

match_max 100000

log_user 0
send "list virtualdisk\r"
expect "OVM> "
set resultdata $expect_out(buffer)
set resultlength [string length $resultdata]
set idindex 0
set id ""
set done 0
while {$done != 1} {
     set idindex [string first "id:" $resultdata]
     set nameindex [string first "name:" $resultdata]
        if {$idindex != -1 && $nameindex != -1 && $idindex < $nameindex} { set id [string range $resultdata [expr {$idindex+3}] [expr {$nameindex-3}]] send "show VirtualDisk id='$id'\r" expect "OVM> "
            set getVirtualDiskInfo $expect_out(buffer)
            set getVirtualDiskInfoLength [string length $getVirtualDiskInfo]
            set getVirtualDiskInfoIndex 0
            set getVirtualDiskInfoMapping ""
            set doneProcessingVirtualDisk 0
            while {$doneProcessingVirtualDisk != 1} {
                set getVirtualDiskInfoIndex [string first "VmDiskMapping" $getVirtualDiskInfo]
                     if {$getVirtualDiskInfoIndex != -1} {
                           puts "Disk with mapping: '$id  \r"
                           set doneProcessingVirtualDisk 1
                        } else {
                           puts "Disk without mapping:'$id  \r"
                           set doneProcessingVirtualDisk 1
       set resultdata [string range $resultdata [expr {$nameindex+1}] $resultlength]
       set resultlength [string length $resultdata]
        } else {
                set done 1

log_user 1

expect "OVM> "
send "exit\r"

You can see the script is simple enough and doesn’t require a lot of time to write. I redirected output of the script to a file in order to analyze the output.

[oracle@vm129-132 ~]$ ./dsk_inventory admin password >dsk_iventory.out
[oracle@vm129-132 ~]$ wc -l dsk_inventory.out 
1836 dsk_inventory.out
[oracle@vm129-132 ~]$ grep "Disk without mapping" dsk_inventory.out | wc -l
[oracle@vm129-132 ~]$ 

As you could see, I had 482 orphaned disks out of 1836. It was more than 25% of all disks and it was not only wasting space but it also had a significant impact to interface performance. Every time when you tried to add, modify or delete a disk through OEM SS it took a long pause to retrieve information about the disks. I decided to delete all those disks using the same script but just added a couple of lines to delete the disk if it doesn’t have a mapping.
Here is modified section of the script:

            while {$doneProcessingVirtualDisk != 1} {
                set getVirtualDiskInfoIndex [string first "VmDiskMapping" $getVirtualDiskInfo]
                     if {$getVirtualDiskInfoIndex != -1} {
                           puts "Disk with mapping:'$id'\r"
                           set doneProcessingVirtualDisk 1
                        } else {
                           puts "Disk without mapping:'$id'\r"
                           send "delete VirtualDisk id='$id'\r"
                           expect "OVM> "
                           set doneProcessingVirtualDisk 1

The changes were minimal and send “delete” command to OVM if a disk doesn’t have any mapping. Of course if you want to exclude certain disks you should add more conditions with “if” using disks ids to prevent them from being deleted.

And it is safe since you are using an approved standard interface and it will not allow you to delete a disk if it has an active mapping to any VM. If you try to delete a disk with an active mapping you are going to get an error:

OVM> delete VirtualDisk id=0004fb0000120000493379bb12928c33.img
Command: delete VirtualDisk id=0004fb0000120000493379bb12928c33.img
Status: Failure
Time: 2017-05-19 09:28:13,046 PDT
JobId: 1495211292856
Error Msg: Job failed on Core: OVMRU_002018E crepo1 - Cannot delete virtual device 6F4dKi9hT0cYW_crs_asm_disk_1 (23), it is still in use by [DLTEST0:vm129-132 ]. [Fri May 19 09:28:12 PDT 2017]

I ran my script, deleted all the non-mapped disks and repeated the inventory script to verify results. I found a couple of disks which were not deleted.

[oracle@vm129-132 ~]$ ./del_orph_dsk admin Y0u3uck2 > del_dsk_log.out
[oracle@vm129-132 ~]$ ./dsk_inventory admin Y0u3uck2 >dsk_inventory_after.out
[oracle@vm129-132 ~]$ wc -l dsk_inventory_after.out
1356 dsk_inventory_after.out
[oracle@vm129-132 ~]$ grep "Disk without mapping" dsk_inventory_after.out | wc -l
[oracle@vm129-132 ~]$ grep "Disk without mapping" dsk_inventory_after.out 
Disk without mapping:0004fb0000120000a2d31cc7ef0c2d86.img 
Disk without mapping:0004fb0000120000da746f417f5a0481.img 
[oracle@vm129-132 ~]$ 

It appeared that the disks didn’t have any existing files on the repository filesystem. It looked like the files were lost some time ago due to a bug or maybe some past issues on the file system.

OVM> show VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Command: show VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Status: Success
Time: 2017-05-19 12:35:13,383 PDT
  Absolute Path = nfsserv:/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img
  Mounted Path = /OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img
  Max (GiB) = 40.0
  Used (GiB) = 22.19
  Shareable = No
  Repository Id = 0004fb0000030000998d2e73e5ec136a  [crepo1]
  Id = 0004fb0000120000a2d31cc7ef0c2d86.img  [ovmcloudomsoh (3)]
  Name = ovmcloudomsoh (3)
  Locked = false
  DeprecatedAttrs = [Assembly Virtual Disk]
OVM> delete VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Command: delete VirtualDisk id=0004fb0000120000a2d31cc7ef0c2d86.img
Status: Failure
Time: 2017-05-19 12:36:39,479 PDT
JobId: 1495222598733
Error Msg: Job failed on Core: OVMAPI_6000E Internal Error: OVMAPI_5001E Job: 1495222598733/Delete Virtual Disk: ovmcloudomsoh (3) from Repository: crepo1/Delete Virtual Disk: ovmcloudomsoh (3) from Repository: crepo1, failed. Job Failure Event: 1495222599299/Server Async Command Failed/OVMEVT_00C014D_001 Async command failed on server: vms01.dlab.pythian.com. Object: ovmcloudomsoh (3), PID: 27092, 

                                                                                                                                                                  Server error: [Errno 2] No such file or directory: '/OVS/Repositories/0004fb0000030000998d2e73e5ec136a/VirtualDisks/0004fb0000120000a2d31cc7ef0c2d86.img'

                                                                           , on server: vms01.dlab.pythian.com, associated with object: 0004fb0000120000a2d31cc7ef0c2d86.img [Fri May 19 12:36:39 PDT 2017] 

So, we had information about disks in the repository database but didn’t have the disks themselves. To make the repository consistent, I created empty files with the same names as the nonexistent virtual disks and deleted them using OVM CLI interface.

root@nfsserv:~# ll /data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img
/data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img: No such file or directory
root@nfsserv:~# touch /data1/vms/ovs/crepo1/VirtualDisks/0004fb0000120000da746f417f5a0481.img

OVM> delete VirtualDisk id=0004fb0000120000da746f417f5a0481.img
Command: delete VirtualDisk id=0004fb0000120000da746f417f5a0481.img
Status: Success
Time: 2017-05-23 07:41:43,195 PDT
JobId: 1495550499971

I think it can be worth to check from time to time whether you have any disks without mapping to any VM, especially if your environment has a considerable number of disks and has long story of upgrades, updates and high users activity. And now a couple of words about OVM CLI and using “expect” language for scripting… As you can see, the combination provides good options to program your daily routine maintenance on OVM. It would take ages to find and clear all those disks manually using GUI.