By default the attribute MANAGER_IP, on the CBTOOL’s cloud configuration file (always named as ~/osgcloud/cbtool/configs/LINUXUSERNAME_cloud_definitions.txt) is set to $IP_AUTO. When it finds this special value, CBTOOL attempts to auto-discover the IP address on the Orchestrator. However, if your CBTOOL Orchestrator node has more than one network interface card (or even multiple IPs on a single interface), CBTOOL is unable to automatically decide on which one should be used
In that case, set an actual IP address (selected by the tester among the many found on the Orchestrator node) on the cloud configuration file
vi ~/osgcloud/cbtool/configs/ubuntu_cloud_definitions.txt
MANAGER_IP = IPADDRESS_OF_INTERFACE_FOR_ORCHESTRATOR_NODE
The above snippet assumes that the Linux user under which CBTOOL is running is ubuntu. If Linux user has a different name, the configuration file will have a different starting name (s/ubuntu/LINUXUSERNAME)
If cloud under test is an OpenStack cloud, the tester must set the host name of cloud controller (nova API) in /etc/hosts file of the benchmark harness machine
Restart CBTOOL in debug mode with ~/osgcloud/cbtool/cb --hard_reset -v 10
Instruct CBTOOL to deploy the single VMs (or Containers) without attempting to establishing contact to these, by running the following command on the CLI
vmdev
After that, just run
vmattach cassandra
or
vmattach hadoopmaster
CBTOOL will output each command that would have been executed against the VM/Containers. At this point, you can just execute the commands yourself and check for error messages.
NOTE: When running the script cb_post_boot.sh directly on the VM/Container, it should complete succesfully and promptly (e.g., within 90 seconds). If the script hangs, the VM/Container is probably having trouble reaching either the Orchestrator node (specifically on TCP ports 6379,27107 and UDP port 5114) or the NTP server.
Once done, do not forget to disable the debug mode by issuing the command CLI
vmundev
NOTE: There is a small utility, called “cbssh”, that could be used to directly connect to the VMs. To try it, just run the following on a bash prompt
cd ~/cbtool; ~/cbtool/cbssh vm_1
and you should be able to login on the node.
Restart CBTOOL in debug mode with ~/osgcloud/cbtool/cb –hard_reset -v 10
Instruct CBTOOL to deploy a single Application Instance (AI) without running the actual configuration/startup scripts on its multiple VMs/Containers, by running the following command on the CLI
appdev
After that, just run
aiattach cassandra_ycsb
or
aiattach hadoop
CBTOOL will output each command that would have been executed against each one of the VMs/Containers that compose the VApp. At this point, you can just execute the command yourself and check what could be wrong.
IMPORTANT:Please note that some commands should be executed in parallel (or at least in a non-sequential manner) and therefore might required multiple prompts for overlapping execution
Once done, do not forget to disable the debug mode by issuing the command
appundev
Restart CBTOOL in debug mode with ~/osgcloud/cbtool/cb –hard_reset -v 10
Instruct CBTOOL to fully deploy the Virtual Application instance, but do not start the actual load generation by running the following command on the CLI
appnoload
After that, just run
aiattach cassandra_ycsb
or
aiattach hadoop
At the very end, CBTOOL will output a message such as “Load Manager will NOT be automatically started on VM NAME during the deployment of VAPP NAME...”.
Obtain a list of VMs with the command
vmlist
Login on the instance with the role “yscb” (in case of cassandra_ycsb) or “hadoopmaster” (in case of hadoop) using CBTOOL’s helper utility, e.g.
cd ~/cbtool; ~/osgcloud/cbtool/cbssh vm_4
Try to run the “load generation” script directly on the instance
~/cbtool/cb_ycsb.sh workloadd 1 1 1
or
~/cbtool/cb_hadoop_job.sh kmeans 1 1 1
If the “load generation” script completes successfully, then try to run the Load Manager daemon in debug mode
/usr/local/bin/cbloadman
Watch for errors in the Load Manager process, displayed directly on the terminal.
Once done, do not forget to disable the debug mode by issuing the command
appload
There are couple of reasons why baseline phase can hang:
CBTOOL logs are stored in /var/log/cloudbench . Search for errors as follows:
cd /var/log/cloudbench/
tail -f /var/logs/cloudbench/LINUXUSERNAME_remotescripts.log
Search for the AI or the instance name for which there are errors.
Please make sure that the following parameters are set correctly:
instance_user: cbuser
instance_keypath: HOMEDIR/osgcloud/cbtool/credentials/id_rsa
Please also make sure that the permissions of the private key are set correctly, that is:
chmod 400 id_rsa
CBTOOL has the ability to execute generic scripts at specific points during the VM/Container attachment (e.g., before the provision request is issued to the cloud, after the VM is reported as “started” by the cloud). A small example script which creates a new keystone tenant, a new pubkey pair, a new security group and a neutron network, subnet and router was made available under the “scenarios/util” directory.
To use it with Application Instances (i.e., ALL VMs/Containers belonging to an AI on its own tenant/network/subnet/router) issue the following command on the CLI
cldalter ai_defaults execute_script_name=/home/cbuser/cloudbench/scenarios/scripts/openstack_multitenant.sh
After this each new AI that is attached will execute the aforementioned script. You can test it out with the following command
aiattach nullworkload default default none none execute_provision_originated
Please note that the commands listed on the script will be executed from the Orchestrator node, and thus require the existence of all relevant OpenStack CLI clients (e.g., openstack, nova and neutron) present there.
This benchmark is designed to stress your cloud’s control plane as much as the data plane. When applications are being submitted to your cloud, during a compliant run, they are required to arrive at the cloud during a random interval between 5 and 10 minutes. If your cloud’s AI empirical provisioning time (including all the VMs participating with that AI / application) is less than or equal to the average arrival times from that interval, then the benchmark will likely always be able to collect data in a timely manner from the vast majority of the AIs sent to your cloud. Basically, VMs won’t get backed up and will get services as fast as the benchmark is throwing them at your cloud. This is (currently) hard to achieve — every VM would likely need to be provisioned well under a minute, and be provisioned in parallel for this to happen – and given that, on average, the two types of AIs that SPEC Cloud uses contain as many as 7 virtual machines in each type of AI. On the other hand, if your cloud’s AI provisioning time (and all participating VMs) are slower than that interval, you will always have an ever-increasing backlog of virtual machines waiting to complete their deployment against your cloud. In this situation, the benchmark will still have a compliant run, but will terminate early without collecting application performance results from all of the AIs. This behavior is normal, expected and compliant. So, don’t be surprised in the later scenario that the computed score doesn’t match the AI limit you set out to achieve. As long is the score is stable and doesn’t fluctuate much (within 10%), the benchmark is doing what it is supposed to be doing.
In either of the aforementioned scenarios, the variables maximum_ais and reported_ais are designed to create consistency in the results computed by SPEC Cloud. Thus, when choosing to prepare a SPEC Cloud submission report, you have a choice between:
Scenario a) Your cloud’s AI provisioning time is faster than the arrival rate (less than or equal to the 5/10 minute interval) ===> The benchmark will likely terminate by reaching reported_ais first.
Scenario b) Your cloud’s AI provisioning time is slower than the arrival rate (greater than the 5/10 minute interval) ===> The benchmark will terminate by reaching maximum_ais first.
In scenario a, you will likely want to set reported_ais to a lower number than maximum_ais, so that SPEC Cloud only calculates a final score based on a consistent number of AIs created by the benchmark. SPEC Cloud does not (and does not pretend to) calculate this number for you —- only the submitter can really choose this number from empirical experience.
In scenario b, you will likely want to set reported_ais equal to maximum_ais so that as many AIs as possible count towards the final score calculated by SPEC Cloud. Public clouds with limits on their API request rates or that throttle the number of simultaneous VMs that can complete at the same time due to DDoS mitigation or load balancing techniques will likely fall into this category.
Some clouds, in particular public clouds, enforce artificial limits on to protect abuse or denial of service. In such situations, users need the ability to control how fast the benchmark hits the cloud’s API.
====> daemon_parallelism: This controls the maximum simultaneous outstanding number of AIs that are actively being provisioned at the same time (not the number that have been issued.) For a compliant run, the benchmark deploys AIs between 5 and 10 minutes, but if the cloud has customer limits and cannot absorb this load, setting this value can help.
====> attach_parallelism: This controls the maximum simultaneous outstanding number of VMs in a single AI that are actively being provisioned (not the number that the AI is ‘expecting’ to provision). This value can be changed if the cloud needs to be further protected due to abuse or denial of service limits imposed by the cloud, which otherwise cannot be changed for any user.
Lowering the limits set for these parameters may cause CBTOOL to hold issuing request for new application instances if the number of in flight application instances equals the parameter set in daemon_parallelism. If this happens, the interarrival time between application instances may exceed the limit prescribed by the benchmark, that is, 600s. Such a result is considered non-compliant.
If a run is started with the existing values in osgcloud_rules.yaml, you may see an error like this:
2016-08-16 15:53:46,270 INFO process ERROR: results->support_evidence->instance_support_evidence is False. Result will be non-compliant.
2016-08-16 15:53:46,270 INFO process ERROR: results->support_evidence->cloud_config_support_evidence is False. Result will be non-compliant.
In SPEC Cloud kit, supporting evidence collection for instances and cloud is set to False. It must be enabled for a compliant run.
If the run is for internal results, there is no need to update the value for these parameters.
A compliant run requires that instances created during every iteration of baseline are deleted immediately after the iteration. This is already implemented in the benchmark all_run script.
For testing, to disable instance deletion after each iteration of baseline phase, set the following flag in osgcloud_rules.yaml:
baseline:
destroy_ai_upon_completion: false
A very detailed timeline for the deployment of each AI can be found on ~/results/EXPID/perf/EXPIDELASTICITY/<USERNAME>_operations.log
An illustrative example for a 2-VM Application Instance (using the “iperf” workload, not part of SPECCloud) follows:
cbuser@cborch:~/cloudbench$ cat /var/log/cloudbench/cbuser_operations.log | grep ai_1
Aug 16 14:16:03 cborch.public [2016-08-16 14:16:03,865] [DEBUG] active_operations.py/ActiveObjectOperations.pre_attach_ai TEST_cbuser - Starting the attachment of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)...
Aug 16 14:16:04 cborch.public [2016-08-16 14:16:04,302] [DEBUG] active_operations.py/ActiveObjectOperations.pre_attach_vm TEST_cbuser - Starting the attachment of vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)...
Aug 16 14:16:04 cborch.public [2016-08-16 14:16:04,312] [DEBUG] base_operations.py/ActiveObjectOperations.admission_control TEST_cbuser - Reservation for vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) was successfully obtained..
Aug 16 14:16:04 cborch.public [2016-08-16 14:16:04,326] [DEBUG] active_operations.py/ActiveObjectOperations.pre_attach_vm TEST_cbuser - Starting the attachment of vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)...
Aug 16 14:16:04 cborch.public [2016-08-16 14:16:04,336] [DEBUG] base_operations.py/ActiveObjectOperations.admission_control TEST_cbuser - Reservation for vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) was successfully obtained..
Aug 16 14:16:04 cborch.public [2016-08-16 14:16:04,744] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_ready TEST_cbuser - Waiting for vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), to start...
Aug 16 14:16:04 cborch.public [2016-08-16 14:16:04,748] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_ready TEST_cbuser - Waiting for vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), to start...
Aug 16 14:16:05 cborch.public [2016-08-16 14:16:05,745] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_ready TEST_cbuser - Check if vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) has started by querying thecloud directly.
Aug 16 14:16:05 cborch.public [2016-08-16 14:16:05,750] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_ready TEST_cbuser - Check if vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) has started by querying thecloud directly.
Aug 16 14:16:05 cborch.public [2016-08-16 14:16:05,756] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_boot TEST_cbuser - Trying to establish network connectivity to vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), on IP address 9.2.211.203...
Aug 16 14:16:05 cborch.public [2016-08-16 14:16:05,757] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_boot TEST_cbuser - Trying to establish network connectivity to vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), on IP address 9.2.211.203...
Aug 16 14:16:06 cborch.public [2016-08-16 14:16:06,758] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_boot TEST_cbuser - Assuming that vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) is booted after waiting for 0 seconds.
Aug 16 14:16:06 cborch.public [2016-08-16 14:16:06,759] [DEBUG] shared_functions.py/PdmCmds.wait_for_instance_boot TEST_cbuser - Assuming that vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) is booted after waiting for 0 seconds.
Aug 16 14:16:06 cborch.public [2016-08-16 14:16:06,887] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Checking ssh accessibility on vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9): ssh -p 10001 fedora@9.2.211.203 "/bin/true"...
Aug 16 14:16:06 cborch.public [2016-08-16 14:16:06,888] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Checking ssh accessibility on vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9): ssh -p 10002 fedora@9.2.211.203 "/bin/true"...
Aug 16 14:16:07 cborch.public [2016-08-16 14:16:07,005] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Checked ssh accessibility on vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)
Aug 16 14:16:07 cborch.public [2016-08-16 14:16:07,006] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Bootstrapping vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9): creating file cb_os_paramaters.txt in "fedora" user's home dir on IP address 9.2.211.203...
Aug 16 14:16:07 cborch.public [2016-08-16 14:16:07,010] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Checked ssh accessibility on vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)
Aug 16 14:16:07 cborch.public [2016-08-16 14:16:07,010] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Bootstrapping vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9): creating file cb_os_paramaters.txt in "fedora" user's home dir on IP address 9.2.211.203...
Aug 16 14:16:09 cborch.public [2016-08-16 14:16:09,375] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Bootstrapped vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)
Aug 16 14:16:09 cborch.public [2016-08-16 14:16:09,375] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Sending a copy of the code tree to vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), on IP address 9.2.211.203...
Aug 16 14:16:09 cborch.public [2016-08-16 14:16:09,945] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Sent a copy of the code tree to vm_1 (F51F167D-7962-51B6-93CA-8E99A5D70748), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), on IP address 9.2.211.203...
Aug 16 14:16:10 cborch.public [2016-08-16 14:16:10,378] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Bootstrapped vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)
Aug 16 14:16:10 cborch.public [2016-08-16 14:16:10,378] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Sending a copy of the code tree to vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), on IP address 9.2.211.203...
Aug 16 14:16:10 cborch.public [2016-08-16 14:16:10,952] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_cbuser - Sent a copy of the code tree to vm_2 (FD0EA8D8-F89C-5624-8FC3-EF6C6574E367), part of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9), on IP address 9.2.211.203...
Aug 16 14:16:11 cborch.public [2016-08-16 14:16:11,400] [DEBUG] base_operations.py/ActiveObjectOperations.admission_control TEST_cbuser - Reservation for ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) was successfully obtained..
Aug 16 14:16:11 cborch.public [2016-08-16 14:16:11,464] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - Performing generic application instance post_boot configuration on all VMs belonging to ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)...
Aug 16 14:16:16 cborch.public [2016-08-16 14:16:16,985] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - The generic post-boot "setup" scripts for ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) completed with status 0 after 5 seconds
Aug 16 14:16:16 cborch.public [2016-08-16 14:16:16,993] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - Running application-specific "setup" configuration on all VMs belonging to ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)...
Aug 16 14:16:16 cborch.public [2016-08-16 14:16:16,996] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - QEMU Scraper will NOT be automatically started during the deployment of ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9)...
Aug 16 14:16:21 cborch.public [2016-08-16 14:16:21,502] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - The application-specific "setup" scripts for ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) completed with status 0 after 5/5 seconds
Aug 16 14:16:22 cborch.public [2016-08-16 14:16:22,016] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - The application-specific "setup" scripts for ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) completed with status 0 after 1/6 seconds
Aug 16 14:16:24 cborch.public [2016-08-16 14:16:24,032] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - The application-specific "setup" scripts for ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) completed with status 0 after 2/8 seconds
Aug 16 14:16:24 cborch.public [2016-08-16 14:16:24,045] [DEBUG] base_operations.py/ActiveObjectOperations.parallel_vm_config_for_ai TEST_cbuser - Parallel VM configuration for ai_1 (66E19008-7FEF-5BF4-94E8-9F366918E0A9) success.
Aug 16 14:16:24 cborch.public [2016-08-16 14:16:24,132] [DEBUG] active_operations.py/ActiveObjectOperations.objattach TEST_cbuser - AI object 66E19008-7FEF-5BF4-94E8-9F366918E0A9 (named "ai_1") sucessfully attached to this experiment. It is ssh-accessible at the IP address 10.0.0.3 (cb-cbuser-TESTPDM-vm1-iperfclient-ai-1).
The total provisioning time has 7 components, being the first an epoch timestamp and the subsequent 6 deltas (in seconds):
mgt_001_provisioning_request_originated_abs
mgt_002_provisioning_request_sent
mgt_003_provisioning_request_completed
mgt_004_network_acessible
mgt_005_file_transfer
mgt_006_instance_preparation
mgt_007_application_start
These values are all stored on the file “VM_management_*.csv”, present on the directory
~/results/<EXPID>/perf/<EXPID><AI TYPE>[BASELINE, ELASTICITY]<TIMESTAMP>. Additionally, the detailed timeline described on the previous question will also output the specific reason for a provisioning qos violation on a given Application Instance.
CBTOOL uses OpenStack APIs. If there are no changes in OpenStack APIs, new releases should continue to be supported. SPEC Cloud has been tested up to Newton release of OpenStack.