Author Topic: problems running three tiles (RMI)  (Read 11753 times)

tdeneau

  • Jr. Member
  • **
  • Posts: 51
  • Karma: +1/-1
problems running three tiles (RMI)
« on: January 03, 2017, 02:57:00 PM »
I am currently trying to run with 3 tiles, each tile being driven from its own unique client VM.  I have yet to get a successful 3-tile run.  My problem is probably with the sequence of steps preparing things before the run.  Can someone post what the recommended procedure is?

For example, I noticed that if I used the Control.config that came with the Example VM scripts, then I see jAppInitRstr.sh being run simultaneously from each of the appservers so we have 3 appservers all trying to do a db restore at the same time, which seemed clearly wrong.  I have commented out the PRIME_HOST_INIT_SCRIPT
  • section from Control.config and am trying to do the Inits from my own script before the run.


When I have completed my initialization I tend to check the following for x=1,2,3
   * http://webserverx/Support
   * http://appserverx:8000/SPECjAppServer/app?action=atomicityTests
and these always look fine
However, during the run something usually fails, for example
Code: [Select]
2017-01-02 19:46:04:095 Warning: .doMyLongCommand received an SocketTimeoutException exception
java.net.SocketTimeoutException: Read timed out

For a three-tile run would the following init sequence be correct?

  • jAppInitRstr.sh for appserver1 (restores dbserver1)
  • jAppInit.sh for appserver2, 3 (without restore) (Can these all be run concurrently?)
  • mailInitRstr.sh for each of mailserver 1,2,3 (assume all can be concurrent)
  • webInit.sh for each of webserver 1,2,3  (assume all can be concurrent)
  • batchInit.sh for each of batchserver 1,2,3  (assume all can be concurrent)
  • Do all VMs need to be rebooted before each run?



« Last Edit: January 04, 2017, 11:35:38 AM by lroderic »

ChrisFloyd

  • Moderator
  • Jr. Member
  • *****
  • Posts: 52
  • Karma: +2/-0
Re: problems running three tiles
« Reply #1 on: January 03, 2017, 03:04:49 PM »
Tom,

The jAppInitRstr.sh should be run one the first app server (only) for each DB. E.g., Appserver1, Appserver5, Appserver9, etc..

As for the error msg you posted, that may be "normal", depending on how many SocketTimeoutExceptions you are experiencing during the start of the run. It isn't unusual to see maybe a dozen or so of these messages during warmup, especially if your disk subsystem isn't very low latency (e.g., SSD-based).  For the mail workload, are you running from a previously "warmed-up then restored" mail store? (see my response to your question from Dec 5th: https://www.spec.org/forums/index.php?topic=63.0 )

Thanks,
Chris

tdeneau

  • Jr. Member
  • **
  • Posts: 51
  • Karma: +1/-1
Re: problems running three tiles
« Reply #2 on: January 03, 2017, 04:58:54 PM »
From someong not that familiar with RMI...

I am running each client in a separate VM.  At this stage of "functionality only", all the VMs including the client VMs are running on the SUT.

Previously I had been running my SPECVIRT_HOST on the same VM as client1 and did not have any RMI problems.
I wanted to rearrange things so that the SPECVIRT_HOST is running on bare metal on the SUT itself (and eventually on a separate system).

In Control.config (on controller and on each client) I use
    SPECVIRT_HOST=specvirt-controller

I make sure the name specvirt-controller with the correct IP is in the /etc/hosts file on all the VMs.

I then get the following error from the clients.

-> 2017-01-03 15:37:26:491 Remote exception calling getHostName(). Exception was:
java.rmi.ConnectException: Connection refused to host: 192.168.122.1; nested exception is: > 
java.net.ConnectException: Connection timed out (Connection timed out)



I seem to be able to get rid of these errors by adding
   -Djava.rmi.server.hostname=specvirt-controller

to the java invocations in Clientmgr.sh and runspecvirt.sh

But is there a cleaner way of handling this?

lroderic

  • Moderator
  • Full Member
  • *****
  • Posts: 167
  • Karma: +6/-0
Re: problems running three tiles (RMI)
« Reply #3 on: January 04, 2017, 04:40:20 PM »
Code: [Select]
jAppInitRstr.sh for appserver1 (restores dbserver1)
jAppInit.sh for appserver2, 3 (without restore) (Can these all be run concurrently?)
webInit.sh for each of webserver 1,2,3  (assume all can be concurrent)
batchInit.sh for each of batchserver 1,2,3  (assume all can be concurrent)
mailInitRstr.sh for each of mailserver 1,2,3 (assume all can be concurrent)

Yes, these are run across the different clients simultaneously. Since the dbserver restore only occurs on vclient1, it'll take longer than the other three workloads. But the harness waits until all INIT scripts are finished running before continuing.

Code: [Select]
Do all VMs need to be rebooted before each run?
This is not a requirement and is up to you - depends on your measurement requirements.

Regarding the problem with hostname=specvirt-controller, does this happen if you don't use a - in the hostname?

Lisa

ChrisFloyd

  • Moderator
  • Jr. Member
  • *****
  • Posts: 52
  • Karma: +2/-0
Re: problems running three tiles (RMI)
« Reply #4 on: January 06, 2017, 05:31:55 PM »
Tom,

Do you have lines at the top of your client's /etc/hosts file containing entries for "localhost" and "::1", etc?  If so, have you tried commenting those out and trying again?  I recall the problem you are running into has to do with RMI naming lookup/resolution not matching what it thinks the local hostname is.  It's been a while since I've seen this problem, but I think the resolution for me was to comment/remove the "localhost" related entries.


Maxine79

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
Re: problems running three tiles
« Reply #5 on: October 13, 2021, 06:00:17 AM »
Tom,

The jAppInitRstr.sh should be run one the first app server (only) for each DB. E.g., Appserver1, Appserver5, Appserver9, etc..

As for the error msg you posted, that may be "normal", depending on how many SocketTimeoutExceptions you are experiencing during the start of the run. It isn't unusual to see maybe a dozen or so of these messages during warmup, especially if your disk subsystem isn't very low latency (e.g., SSD-based).  For the mail workload, are you running from a previously "warmed-up then restored" mail store?Indigo Card (see my response to your question from Dec 5th: https://www.spec.org/forums/index.php?topic=63.0 )

Thanks,
Chris


This works really well for me, thank you! Facing same issue here. Help is appreciated.
« Last Edit: October 14, 2021, 12:19:03 AM by Maxine79 »

Grant7288

  • Newbie
  • *
  • Posts: 1
  • Karma: +0/-0
Re: problems running three tiles
« Reply #6 on: November 03, 2021, 01:10:00 AM »
Tom,

The jAppInitRstr.sh should be run one the first app server (only) for each DB. E.g., Appserver1, Appserver5, Appserver9, etc..

As for the error msg you posted, that may be "normal", depending on how many SocketTimeoutExceptions you are experiencing during the start of the run. It isn't unusual to see maybe a dozen or so of these messages during warmup, especially if your disk subsystem isn't very low latency (e.g., SSD-based).  For the mail workload, are you running from a previously "warmed-up then restored" mail store? (see my response to mcdvoiceyour question from Dec 5th: https://www.spec.org/forums/index.php?topic=63.0 )

Thanks,
Chris

Thanks for the update and quick reply. I'll be sure to keep an eye on this thread.
« Last Edit: November 08, 2021, 11:44:48 PM by Grant7288 »

Gamble69

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-4
Re: problems running three tiles (RMI)
« Reply #7 on: February 23, 2022, 06:08:05 AM »
I am currently trying to run with 3 tiles, each tile being driven from its own unique client VM.  I have yet to get a successful 3-tile run.  My problem is probably with the sequence of steps preparing things before the run.  Can someone post what the recommended procedure is?

For example, I noticed that if I used the Control.config that came with the Example VM scripts, then I see jAppInitRstr.sh being run simultaneously from each of the appservers so we have 3 appservers all trying to do a db restore at the same time, which seemed clearly wrong.  I have commented out the PRIME_HOST_INIT_SCRIPT
  • section from Control.config and am trying to do the Inits from my own script before the run.


When I have completed my initialization I tend to check the following for x=1,2,3
   * http://webserverx/Support
   * http://appserverx:8000/SPECjAppServer/app?action=atomicityTests/Tellthebell.com
and these always look fine
However, during the run something usually fails, for example

For a three-tile run would the following init sequence be correct?

  • jAppInitRstr.sh for appserver1 (restores dbserver1)
  • jAppInit.sh for appserver2, 3 (without restore) (Can these all be run concurrently?)
  • mailInitRstr.sh for each of mailserver 1,2,3 (assume all can be concurrent)
  • webInit.sh for each of webserver 1,2,3  (assume all can be concurrent)
  • batchInit.sh for each of batchserver 1,2,3  (assume all can be concurrent)
  • Do all VMs need to be rebooted before each run?

I recall the problem you are running into has to do with RMI naming lookup/resolution not matching what it thinks the local hostname is.
« Last Edit: February 25, 2022, 05:46:40 PM by DavidSchmidt »