Simple interfacing to a Beowulf cluster. More...
#include <Beowulf/Beowulf.H>
Classes | |
struct | NodeInfo |
Public Member Functions | |
Constructors and destructors | |
Beowulf (OptionManager &mgr, const std::string &descrName="Beowulf", const std::string &tagName="Beowulf", const bool ismaster=false) | |
Constructor. | |
void | resetConnections (const int keepfd=-1) |
Reset and kill all connections except possibly one (keepfd). | |
virtual | ~Beowulf () |
Destructor. | |
Access functions | |
int | getNbSlaves () const |
get number of slave nodes | |
int | getNodeNumber () const |
get our node number (-1 is the master) | |
const char * | nodeName (const int nb) const |
Get hostname:port of node with given node number. | |
int | requestNode () |
Request a so-far unallocated node. | |
void | releaseNode (int nodenum) |
De-allocate a currently allocated node. | |
Message passing functions | |
void | send (const int node_nb, TCPmessage &msg) |
Send message to another node. | |
void | send (TCPmessage &msg) |
Send message to the least-loaded of our slave nodes. | |
bool | receive (int &node_nb, TCPmessage &msg, int32 &frame, int32 &action, const int timeout=0, int *err=0) |
Receive message from a given node (or from any node). | |
int | nbReceived (const int node_nb=-2) |
Do we have any received messages? | |
Protected Member Functions | |
virtual void | paramChanged (ModelParamBase *const param, const bool valueChanged, ParamClient::ChangeStatus *status) |
Intercept people changing our ModelParam. | |
Protected Attributes | |
OModelParam< std::string > | itsSlaveNames |
names of our slaves as a space-separated list of hostname:port | |
OModelParam< bool > | isMaster |
true if we are the master | |
OModelParam< int > | selfqlen |
self-message queue length | |
OModelParam< bool > | selfdroplast |
self-message queue drop policy | |
OModelParam< double > | initTimeout |
max time to wait for initialization |
Simple interfacing to a Beowulf cluster.
The idea of this class is to hide all of the low-level communication setup and transfer details from the user, and to provide a simple interface for passing messages between nodes on a Beowulf cluster. Each slave node should instantiate a Beowulf object and initialize it with slaveInit(). This will block the slave until it is contacted by the Beowulf master node. The master node instantiates a Beowulf object, and initializes it using masterInit(), passing along the hostnames of the slave nodes. During initialization, the Beowulf master contacts all the slaves and instructs them to fully interconnect with each other. Once initialization is complete, any node can send() and receive() TCPmessages to and from any other node. Both send() and receive() are non-blocking methods. Actual queueing and transfer of messages is done in a thread that runs in parallel with the main program thread.
Definition at line 76 of file Beowulf.H.
Beowulf::Beowulf | ( | OptionManager & | mgr, | |
const std::string & | descrName = "Beowulf" , |
|||
const std::string & | tagName = "Beowulf" , |
|||
const bool | ismaster = false | |||
) |
Constructor.
isMaster | true if we are the master of the Beowulf. The master is the one that gets the list of slaves and then initializes all the slaves at start() time. |
Definition at line 51 of file Beowulf.C.
References ModelComponent::addSubComponent(), OModelParam< T >::getVal(), isMaster, itsSlaveNames, and ModelComponent::unregisterParam().
Beowulf::~Beowulf | ( | ) | [virtual] |
int Beowulf::getNbSlaves | ( | ) | const |
int Beowulf::getNodeNumber | ( | ) | const |
int Beowulf::nbReceived | ( | const int | node_nb = -2 |
) |
Do we have any received messages?
Returns the total number of messages in the incoming queues of our various connected nodes. If node_nb == -1, only consider messages from the Beowulf master. If node_nb == -2, consider any node, otherwise only consider the specified node.
const char * Beowulf::nodeName | ( | const int | nb | ) | const |
void Beowulf::paramChanged | ( | ModelParamBase *const | param, | |
const bool | valueChanged, | |||
ParamClient::ChangeStatus * | status | |||
) | [protected, virtual] |
Intercept people changing our ModelParam.
See ModelComponent.H; as parsing the command-line or reading a config file sets our name, we'll also here instantiate a controller of the proper type (and export its options)
Reimplemented from ModelComponent.
Definition at line 79 of file Beowulf.C.
References OModelParam< T >::getVal(), isMaster, itsSlaveNames, ModelComponent::registerOptionedParam(), and ModelComponent::unregisterParam().
bool Beowulf::receive | ( | int & | node_nb, | |
TCPmessage & | msg, | |||
int32 & | frame, | |||
int32 & | action, | |||
const int | timeout = 0 , |
|||
int * | err = 0 | |||
) |
Receive message from a given node (or from any node).
Check whether a message has been received; returns false otherwise. If a message was received, the node it came from will be in node_nb, and its frame and action fields will be pre-decoded for convenience (they still are in the message itself too). This method is always non-blocking, i.e., it returns immediately and does not wait for messages to come in.
node_nb | node number to receive from, or -1 to receive from any node. If a message is received, node_nb will be updated to the node number from which the message was received, or -1 if it was received from the master. | |
msg | received message | |
frame | frame number from received message | |
action | action field from received message | |
timeout | if non-zero, max time (in ms) this call may block | |
err | if non-null, then (*err) will be set to non-zero if an error occurs |
Definition at line 536 of file Beowulf.C.
References ASSERT, TCPmessage::getAction(), TCPmessage::getETI(), TCPmessage::getID(), and Timer::getSecs().
void Beowulf::releaseNode | ( | int | nodenum | ) |
int Beowulf::requestNode | ( | ) |
Request a so-far unallocated node.
This will return the next node number that has not yet been requested. This only works if we are Beowulf master. It will generate an error message and return -2 if we have no more unallocated nodes. We need to be start()'ed for this to work.
Definition at line 134 of file Beowulf.C.
References ModelComponent::started().
void Beowulf::resetConnections | ( | const int | keepfd = -1 |
) |
Reset and kill all connections except possibly one (keepfd).
Resets the Beowulf to uninitialized state. Kills all connections, except possibly one (typically towards the master) that may be specified as argument.
keepfd | the fd to keep (or -1 to keep no fd) |
Definition at line 165 of file Beowulf.C.
References Timer::reset().
void Beowulf::send | ( | TCPmessage & | msg | ) |
Send message to the least-loaded of our slave nodes.
This method is non-blocking (returns immediately). A copy of msg is taken, so you can destroy it immediately after send. Only works if we are the Beowulf master, fatal error otherwise. This implements load balancing. The ETI (estimated time to idle) fields in TCPmessage are used to determine which of our slave nodes has the shortest pending work queue (i.e., shortest ETI) and the message will be sent to that node. Thus, this functionality assumes that every slave node can process every message that you might send them (as opposed to more constrained architectures where a given node is only capable of doing a given type of processing corresponding to a given type of received message). For this load balancing to work, the slaves should try to put good-faith estimates of their time to idle (in seconds) each time they send us (the master) a message back. The master relies on those good-faith estimates to decide which node is the least loaded. This approach has severe limitations if your overall message traffic is low, as your ETI estimates at the master will not be refreshed regularly and may become grossly inaccurate. Thus, this approach is mostly intended for streaming applications, where every node will usually send several messages back to the master every 30ms or so, so that the ETI estimates collected at the master will be reasonably fresh and accurate. If several slave nodes have the lowest ETI, one will be picked at random.
Definition at line 499 of file Beowulf.C.
References ASSERT, TCPmessage::getAction(), TCPmessage::getID(), and randomDouble().
void Beowulf::send | ( | const int | node_nb, | |
TCPmessage & | msg | |||
) |
Send message to another node.
This method is non-blocking (returns immediately). A copy of msg is taken, so you can destroy it immediately after send.
node_nb | is the destination node number. A value of -1 on a slave Beowulf means that msg should be sent to the Beowulf master. |
Definition at line 454 of file Beowulf.C.
References ASSERT, TCPmessage::getAction(), TCPmessage::getID(), OModelParam< T >::getVal(), selfdroplast, and selfqlen.
OModelParam<double> Beowulf::initTimeout [protected] |
OModelParam<bool> Beowulf::isMaster [protected] |
true if we are the master
Definition at line 207 of file Beowulf.H.
Referenced by Beowulf(), and paramChanged().
OModelParam<std::string> Beowulf::itsSlaveNames [protected] |
OModelParam<bool> Beowulf::selfdroplast [protected] |
OModelParam<int> Beowulf::selfqlen [protected] |