Running Elasticity + Scalability For the First Time With Your Cloud

Setting up Parameters

Provision Retries

If cloud under test is baremetal or instances take a long time to provision, the tester can adjust the update_attempts in osgcloud_rules.yaml:

vm_defaults:
    update_attempts: 60
    update_frequency: 5

The values for update_attempts and update_frequency determine whether benchmark considers the provisioning to have failed. A tester can set them to a small value to force provisioning failures for testing.

During a compliant run, the value for update_attempts is automatically calculated based on the max of the average AI provisioning time of YCSB and KMeans.

Using Workloads

The tester can configure which workloads to run during testing of elasticity + scalability phase by modifying the following parameter:

workloads: [ ycsb, kmeans ]

Setting Max AIs and Ignoring QoS

The tester can set the maximum AIs to be provisioned to a specific value or number of AIs from which results is received to a specific value, and ignore the stopping conditions during testing of elasticity + scalability phase. These parameters are controlled as follows:

maximum_ais: 24
reported_ais: 14
ignore_qos_when_max_is_set: false

Running

Elasticity + scalability phase is run as follows. It assumes that CBTOOL is already running and is connected with your cloud.:

python osgcloud_elasticity.py --exp_id SPECRUN

If you run the elasticity + scalability phase as:

python osgcloud_elasticity.py --exp_id SPECRUNID

It assumes that a baseline_SPECRUNID.yaml files exists in ~/results/SPECRUNID/perf .

The results and logs of elasticity + scalability phase are present in the following directory.:

ls ~/results/SPECRUNID/perf

Following files and directories are generated as a result of a successful elasticity + scalability phase run:

elasticity_SPECRUNID.yaml
osgcloud_elasticity_SPECRUNID-20150811234204UTC.log
SPECRUNIDELASTICITY20150811234204UTC

Where the timestamp in file and directory names will change based on the date/time the elasticity + scalability phase was run.

Preparing Environment Parameters for Submission File

The tester must set appropriate values in osgcloud_environment.yaml file. The key/value pairs from this file are dumped into the submission file.

Generating Submission File

The program osgcloud_fdr.py operates on the performance data stored in ~/results/SPECRUN/perf , and the environment file.

The tester should fill out the details of its cloud environment (e.g., physical machines for whitebox cloud) in osgcloud_environment.yaml . The file should be present in ~/results/SPECRUN/perf .

The submission file generation program is then run as follows (assuming your SPECRUN id was SPECRUNID):

python osgcloud_fdr.py --baseline_yaml ~/results/SPECRUNID/perf/baseline_SPECRUNID.yaml --elasticity_results_path ~/results/SPECRUNID/perf/SPECRUNIDELASTICITY*** --exp_id SPECRUNID --environment_yaml osgcloud_environment.yaml

Where *** indicates the timestamp that forms of the elasticity results directory name.

The output of this phase is as the following:

osgcloud_fdr_SPECRUNID-20150811234502UTC.log
fdr_ai_SPECRUNID.yaml
sub_file_SPECRUNID.txt

Generating HTML Report

Assuming the results directory was ``~/result``s , the submission file is present in the following directory:

~/results/EXPID/perf/sub_file_EXPID.txt

If the elasticity experiment was run successfully, the HTML report can be generated as:

cd ~/osgcloud/driver
python osgcloud_fdr_html.py --exp_id EXPID

For whitebox clouds, the tester must include an architecture diagram of the cloud in the PNG format. It can be included in the final HTML report as follows:

python osgcloud_fdr_html.py --exp_id EXPID --networkfile arch.png

The resulting file is generated in:

~/osgcloud/driver

The name of the file is:

FDR_EXPID.html

Tips on Running the Elasticity + Scalability Phase

  • Before starting elasticity + scalability phase, it is best to restart CBTOOL if another elasticity + scalability phase was run. There is no need to restart CBTOOL if only baseline phases were run.

  • When testing elasticity + scalability phase, it is best to start with small number of AIs. Set the number of AIs to 6 or 8.

  • It is not necessary to run baseline phase every time before elasticity + scalability phase. However, elasticity + scalability phase needs the output of baseline phase (as a YAML) file to determine the stopping condition.

    Assuming the experiment ID was EXPID and the result directory was ~/results , and the baseline was already run, the tested can do the following to repeat the elasticity + scalability phase:

    cd ~/results/EXPID/perf/
    rm -rf *ELASTICITY*; rm -f *elasticity*.log
    
  • The instance provisioning time of a cloud under test may be long. The time until CBTOOL waits to check for instance provisioning can be adjusted using update_attempts parameter in osgcloud_rules.yaml file. It should be set to a value such that ( update_attemps x update_frequency ) never exceeds three times the maximum baselne AI provisioning time of YCSB or KMeans workloads. For a compliant run, it must be set before the start of the experiment.

  • If any errors are encountered during an elasticity + scalability phase run, they can be checked in CBTOOL log:

    cd /var/log/cloudbench
    grep ERROR *
    

    The tester can check for errors for specific AI by searching for an AI number.

  • The errors can also occur during an AI_run for a workload. Examples of errors include:

    Cassandra: Create, Remove, List keyspace fail, or seeds fail to form a cluster.
    YCSB: Data generation fails
    Hadoop: Hadoop slaves fail to form a cluster
    KMeans: Data generation fails
    
  • Instance names also include AI numbers. This can come handy when you are debugging elasticity + scalability phase in your cloud.

  • The tester can check for failed AIs by running the following command at the CBTOOL prompt:

    MYSIMCLOUD> ailist failed
    

Testing Instance Supporting Evidence Collection

To test that instance supporting evidence collection works properly, please follow these steps:

  • Start CBTOOL and ensure that it can connect to your cloud and launch instances.

  • Ensure that workload images have been created.

  • Launch an application instance of Cassandra or Hadoop:

    aiattach cassandra_ycsb
    aiattach hadoop
    
  • Determine the Linux username and SSH key path for instances. Assuming the Linux user name is cbuser and SSH key path is ~/osgcloud/cbtool/credentials/cbtool_rsa

  • Test supporting evidence collection for an instance. Run the supporting evidence instance script on CBTOOL machine to collect supporting evidence for an instance.

    Create a directory where the results are stored:

    mkdir /tmp/instance -p
    

    Run the supporting evidence collection script:

    cd ~/osgcloud/driver/support_script/
    ./collect_support_data.sh remote_vm_sysinfo 10.146.5.41 cbuser ~/osgcloud/cbtool/credentials/cbtool_rsa /tmp/instance/
    
    SCRIPT INSTANCE/CASSANDA/HADOOP IPADDR IPOFINSTANCE LINUXUSER SSHKEYPATH TARGETDIR
    
    OUTPUT:
        tree /tmp/instance
        |-- date.txt
        |-- df.txt
        |-- dpkg.txt
        |-- etc
        |   |-- fstab
        |   |-- hosts
        |   |-- iproute2
        |   |   |-- ematch_map
        |   |   |-- group
        |   |   |-- rt_dsfield
        |   |   |-- rt_protos
        |   |   |-- rt_realms
        |   |   |-- rt_scopes
        |   |   `-- rt_tables
        |   |-- nsswitch.conf
        |   |-- security
        |   |   `-- limits.conf
        |   `-- sysctl.conf
        |-- hostname.txt
        |-- ifconfig.txt
        |-- lspci.txt
        |-- mount.txt
        |-- netstat.txt
        |-- ntp.conf
        |-- proc
        |   |-- cmdline
        |   |-- cpuinfo
        |   |-- devices
        |   |-- meminfo
        |   |-- modules
        |   |-- partitions
        |   |-- swaps
        |   `-- version
        |-- route.txt
        `-- var
            `-- log
                `-- dmesg
    
  • Test supporting evidence collection for YCSB and Cassandra.

    Find the IP address of instance with YCSB role from CBTOOL (by typing vmlist on CBTOOL CLI). Then run the following commands:

    mkdir /tmp/ycsb -p
    
    ./collect_support_data.sh remote_vm_software 10.146.5.41 cbuser ~/osgcloud/cbtool/credentials/cbtool_rsa /tmp/cassandra cassandra_ycsb
    
    OUTPUT from machine with YCSB role:
    
        $ tree /tmp/ycsb
        /tmp/ycsb/
        |-- javaVersion.out
        |-- role
        `-- YCSB
            |-- custom_workload.dat
            `-- workloads
                |-- workloada
                |-- workloadb
                |-- workloadc
                |-- workloadd
                |-- workloade
                |-- workloadf
                `-- workload_template
    

    Find the IP address of an instance with SEED role from CBTOOL (by typing vmlist on CBTOOL CLI). Then run the following commands:

    mkdir /tmp/seed -p
    
    ./collect_support_data.sh remote_vm_software 10.146.5.41 cbuser ~/osgcloud/cbtool/credentials/cbtool_rsa /tmp/seed cassandra_ycsb
    
    OUTPUT:
        $ tree /tmp/cassandra
        /tmp/
        |-- cassandra
        |   |-- du_datadir
        |   |-- du_datadir_cassandra
        |   |-- du_datadir_cassandra_usertable
        |   |-- nodetool_cfstats
        |   `-- nodetool_status
        |-- cassandra_conf
        |   |-- cassandra-env.sh
        |   |-- cassandra-rackdc.properties
        |   |-- cassandra-topology.properties
        |   |-- cassandra-topology.yaml
        |   |-- cassandra.yaml
        |   |-- commitlog_archiving.properties
        |   |-- logback-tools.xml
        |   |-- logback.xml
        |   `-- triggers
        |       `-- README.txt
        |-- javaVersion.out
        `-- role
    
  • Testing supporting evidence collection from Hadoop.

    Find the IP address of instance with HADOOPMASTER role from CBTOOL (by typing vmlist on CBTOOL CLI). Then run the following commands:

    mkdir /tmp/hadoop -p
    
    ./collect_support_data.sh remote_vm_software 10.146.5.41 cbuser ~/osgcloud/cbtool/credentials/cbtool_rsa /tmp/hadoop hadoop
    
    OUTPUT from machine with HADOOPMASTER role:
    
        tree /tmp/hadoop/
        /tmp/hadoop/
        |-- hadoop
        |   |-- datahdfs
        |   |-- dfsadmin_report
        |   |-- du_datanodedir
        |   |-- du_namenodedir
        |   |-- input_hdfs_size
        |   |-- output_hdfs_size
        |   `-- version
        |-- hadoop_conf
        |   |-- capacity-scheduler.xml
        |   |-- configuration.xsl
        |   |-- container-executor.cfg
        |   |-- core-site.xml
        |   |-- hadoop-env.cmd
        |   |-- hadoop-env.sh
        |   |-- hadoop-metrics2.properties
        |   |-- hadoop-metrics.properties
        |   |-- hadoop-policy.xml
        |   |-- hdfs-site.xml
        |   |-- httpfs-env.sh
        |   |-- httpfs-log4j.properties
        |   |-- httpfs-signature.secret
        |   |-- httpfs-site.xml
        |   |-- kms-acls.xml
        |   |-- kms-env.sh
        |   |-- kms-log4j.properties
        |   |-- kms-site.xml
        |   |-- log4j.properties
        |   |-- mapred-env.cmd
        |   |-- mapred-env.sh
        |   |-- mapred-queues.xml.template
        |   |-- mapred-site.xml
        |   |-- mapred-site.xml.template
        |   |-- masters
        |   |-- slaves
        |   |-- ssl-client.xml.example
        |   |-- ssl-server.xml.example
        |   |-- yarn-env.cmd
        |   |-- yarn-env.sh
        |   `-- yarn-site.xml
        |-- javaVersion.out
        `-- role