
#include <Beowulf/Beowulf.H>
Inheritance diagram for Beowulf:


The idea of this class is to hide all of the low-level communication setup and transfer details from the user, and to provide a simple interface for passing messages between nodes on a Beowulf cluster. Each slave node should instantiate a Beowulf object and initialize it with slaveInit(). This will block the slave until it is contacted by the Beowulf master node. The master node instantiates a Beowulf object, and initializes it using masterInit(), passing along the hostnames of the slave nodes. During initialization, the Beowulf master contacts all the slaves and instructs them to fully interconnect with each other. Once initialization is complete, any node can send() and receive() TCPmessages to and from any other node. Both send() and receive() are non-blocking methods. Actual queueing and transfer of messages is done in a thread that runs in parallel with the main program thread.
Definition at line 76 of file Beowulf.H.
Public Member Functions | |
Constructors and destructors | |
| Beowulf (OptionManager &mgr, const std::string &descrName="Beowulf", const std::string &tagName="Beowulf", const bool ismaster=false) | |
| Constructor. | |
| void | resetConnections (const int keepfd=-1) |
| Reset and kill all connections except possibly one (keepfd). | |
| virtual | ~Beowulf () |
| Destructor. | |
Access functions | |
| int | getNbSlaves () const |
| get number of slave nodes | |
| int | getNodeNumber () const |
| get our node number (-1 is the master) | |
| const char * | nodeName (const int nb) const |
| Get hostname:port of node with given node number. | |
| int | requestNode () |
| Request a so-far unallocated node. | |
| void | releaseNode (int nodenum) |
| De-allocate a currently allocated node. | |
Message passing functions | |
| void | send (const int node_nb, TCPmessage &msg) |
| Send message to another node. | |
| void | send (TCPmessage &msg) |
| Send message to the least-loaded of our slave nodes. | |
| bool | receive (int &node_nb, TCPmessage &msg, int32 &frame, int32 &action, const int timeout=0, int *err=0) |
| Receive message from a given node (or from any node). | |
| int | nbReceived (const int node_nb=-2) |
| Do we have any received messages? | |
Protected Member Functions | |
| virtual void | paramChanged (ModelParamBase *const param, const bool valueChanged, ParamClient::ChangeStatus *status) |
| Intercept people changing our ModelParam. | |
Protected Attributes | |
| OModelParam< std::string > | itsSlaveNames |
| names of our slaves as a space-separated list of hostname:port | |
| OModelParam< bool > | isMaster |
| true if we are the master | |
| OModelParam< int > | selfqlen |
| self-message queue length | |
| OModelParam< bool > | selfdroplast |
| self-message queue drop policy | |
| OModelParam< double > | initTimeout |
| max time to wait for initialization | |
Classes | |
| struct | NodeInfo |
|
||||||||||||||||||||
|
Constructor.
Definition at line 51 of file Beowulf.C. References ModelComponent::addSubComponent(), OModelParam< T >::getVal(), isMaster, itsSlaveNames, and ModelComponent::unregisterParam(). |
|
|
Destructor. Will properly terminate all connections. |
|
|
get number of slave nodes
Definition at line 109 of file Beowulf.C. References ASSERT, and rutz::max(). |
|
|
get our node number (-1 is the master)
|
|
|
Do we have any received messages? Returns the total number of messages in the incoming queues of our various connected nodes. If node_nb == -1, only consider messages from the Beowulf master. If node_nb == -2, consider any node, otherwise only consider the specified node. Definition at line 605 of file Beowulf.C. References LFATAL. |
|
|
Get hostname:port of node with given node number. This is whatever the user gave at configuration, so it could be just a short hostname, a fully-qualified hostname, or a hostname:port. If nb is -1, we return "BeoMaster" Definition at line 123 of file Beowulf.C. References LFATAL. |
|
||||||||||||||||
|
Intercept people changing our ModelParam. See ModelComponent.H; as parsing the command-line or reading a config file sets our name, we'll also here instantiate a controller of the proper type (and export its options) Reimplemented from ModelComponent. Definition at line 79 of file Beowulf.C. References OModelParam< T >::getVal(), isMaster, itsSlaveNames, ModelComponent::paramChanged(), ModelComponent::registerOptionedParam(), and ModelComponent::unregisterParam(). |
|
||||||||||||||||||||||||||||
|
Receive message from a given node (or from any node). Check whether a message has been received; returns false otherwise. If a message was received, the node it came from will be in node_nb, and its frame and action fields will be pre-decoded for convenience (they still are in the message itself too). This method is always non-blocking, i.e., it returns immediately and does not wait for messages to come in.
Definition at line 536 of file Beowulf.C. References ASSERT, BEO_INIT, TCPmessage::getAction(), TCPmessage::getETI(), TCPmessage::getID(), Timer::getSecs(), LDEBUG, LERROR, LINFO, and name. |
|
|
De-allocate a currently allocated node.
Definition at line 152 of file Beowulf.C. References LERROR. |
|
|
Request a so-far unallocated node. This will return the next node number that has not yet been requested. This only works if we are Beowulf master. It will generate an error message and return -2 if we have no more unallocated nodes. We need to be start()'ed for this to work. Definition at line 134 of file Beowulf.C. References i, LERROR, LFATAL, and ModelComponent::started(). |
|
|
Reset and kill all connections except possibly one (keepfd). Resets the Beowulf to uninitialized state. Kills all connections, except possibly one (typically towards the master) that may be specified as argument.
Definition at line 165 of file Beowulf.C. References Timer::reset(). |
|
|
Send message to the least-loaded of our slave nodes. This method is non-blocking (returns immediately). A copy of msg is taken, so you can destroy it immediately after send. Only works if we are the Beowulf master, fatal error otherwise. This implements load balancing. The ETI (estimated time to idle) fields in TCPmessage are used to determine which of our slave nodes has the shortest pending work queue (i.e., shortest ETI) and the message will be sent to that node. Thus, this functionality assumes that every slave node can process every message that you might send them (as opposed to more constrained architectures where a given node is only capable of doing a given type of processing corresponding to a given type of received message). For this load balancing to work, the slaves should try to put good-faith estimates of their time to idle (in seconds) each time they send us (the master) a message back. The master relies on those good-faith estimates to decide which node is the least loaded. This approach has severe limitations if your overall message traffic is low, as your ETI estimates at the master will not be refreshed regularly and may become grossly inaccurate. Thus, this approach is mostly intended for streaming applications, where every node will usually send several messages back to the master every 30ms or so, so that the ETI estimates collected at the master will be reasonably fresh and accurate. If several slave nodes have the lowest ETI, one will be picked at random. Definition at line 499 of file Beowulf.C. References ASSERT, diff(), TCPmessage::getAction(), TCPmessage::getID(), LDEBUG, and randomDouble(). |
|
||||||||||||
|
Send message to another node. This method is non-blocking (returns immediately). A copy of msg is taken, so you can destroy it immediately after send.
Definition at line 454 of file Beowulf.C. References ASSERT, TCPmessage::getAction(), TCPmessage::getID(), OModelParam< T >::getVal(), LDEBUG, LERROR, selfdroplast, and selfqlen. |
|
|
max time to wait for initialization
|
|
|
true if we are the master
Definition at line 207 of file Beowulf.H. Referenced by Beowulf(), and paramChanged(). |
|
|
names of our slaves as a space-separated list of hostname:port port is optional and we will use the default SockServ port if unspecified. This parameter is only used if we are Beowulf master (see constructor) Definition at line 205 of file Beowulf.H. Referenced by Beowulf(), and paramChanged(). |
|
|
self-message queue drop policy
Definition at line 209 of file Beowulf.H. Referenced by send(). |
|
|
self-message queue length
Definition at line 208 of file Beowulf.H. Referenced by send(). |
1.4.4