Beowulf Class Reference

Simple interfacing to a Beowulf cluster. More...

#include <Beowulf/Beowulf.H>

Inheritance diagram for Beowulf:
Inheritance graph
[legend]
Collaboration diagram for Beowulf:
Collaboration graph
[legend]

List of all members.

Classes

struct  NodeInfo

Public Member Functions

Constructors and destructors

 Beowulf (OptionManager &mgr, const std::string &descrName="Beowulf", const std::string &tagName="Beowulf", const bool ismaster=false)
 Constructor.
void resetConnections (const int keepfd=-1)
 Reset and kill all connections except possibly one (keepfd).
virtual ~Beowulf ()
 Destructor.
Access functions

int getNbSlaves () const
 get number of slave nodes
int getNodeNumber () const
 get our node number (-1 is the master)
const char * nodeName (const int nb) const
 Get hostname:port of node with given node number.
int requestNode ()
 Request a so-far unallocated node.
void releaseNode (int nodenum)
 De-allocate a currently allocated node.
Message passing functions

void send (const int node_nb, TCPmessage &msg)
 Send message to another node.
void send (TCPmessage &msg)
 Send message to the least-loaded of our slave nodes.
bool receive (int &node_nb, TCPmessage &msg, int32 &frame, int32 &action, const int timeout=0, int *err=0)
 Receive message from a given node (or from any node).
int nbReceived (const int node_nb=-2)
 Do we have any received messages?

Protected Member Functions

virtual void paramChanged (ModelParamBase *const param, const bool valueChanged, ParamClient::ChangeStatus *status)
 Intercept people changing our ModelParam.

Protected Attributes

OModelParam< std::stringitsSlaveNames
 names of our slaves as a space-separated list of hostname:port
OModelParam< bool > isMaster
 true if we are the master
OModelParam< int > selfqlen
 self-message queue length
OModelParam< bool > selfdroplast
 self-message queue drop policy
OModelParam< double > initTimeout
 max time to wait for initialization

Detailed Description

Simple interfacing to a Beowulf cluster.

The idea of this class is to hide all of the low-level communication setup and transfer details from the user, and to provide a simple interface for passing messages between nodes on a Beowulf cluster. Each slave node should instantiate a Beowulf object and initialize it with slaveInit(). This will block the slave until it is contacted by the Beowulf master node. The master node instantiates a Beowulf object, and initializes it using masterInit(), passing along the hostnames of the slave nodes. During initialization, the Beowulf master contacts all the slaves and instructs them to fully interconnect with each other. Once initialization is complete, any node can send() and receive() TCPmessages to and from any other node. Both send() and receive() are non-blocking methods. Actual queueing and transfer of messages is done in a thread that runs in parallel with the main program thread.

Definition at line 76 of file Beowulf.H.


Constructor & Destructor Documentation

Beowulf::Beowulf ( OptionManager mgr,
const std::string descrName = "Beowulf",
const std::string tagName = "Beowulf",
const bool  ismaster = false 
)

Constructor.

Parameters:
isMaster true if we are the master of the Beowulf. The master is the one that gets the list of slaves and then initializes all the slaves at start() time.

Definition at line 51 of file Beowulf.C.

References ModelComponent::addSubComponent(), OModelParam< T >::getVal(), isMaster, itsSlaveNames, and ModelComponent::unregisterParam().

Beowulf::~Beowulf (  )  [virtual]

Destructor.

Will properly terminate all connections.

Definition at line 75 of file Beowulf.C.


Member Function Documentation

int Beowulf::getNbSlaves (  )  const

get number of slave nodes

Definition at line 109 of file Beowulf.C.

References ASSERT, and max().

int Beowulf::getNodeNumber (  )  const

get our node number (-1 is the master)

Definition at line 119 of file Beowulf.C.

int Beowulf::nbReceived ( const int  node_nb = -2  ) 

Do we have any received messages?

Returns the total number of messages in the incoming queues of our various connected nodes. If node_nb == -1, only consider messages from the Beowulf master. If node_nb == -2, consider any node, otherwise only consider the specified node.

Definition at line 605 of file Beowulf.C.

const char * Beowulf::nodeName ( const int  nb  )  const

Get hostname:port of node with given node number.

This is whatever the user gave at configuration, so it could be just a short hostname, a fully-qualified hostname, or a hostname:port. If nb is -1, we return "BeoMaster"

Definition at line 123 of file Beowulf.C.

void Beowulf::paramChanged ( ModelParamBase *const   param,
const bool  valueChanged,
ParamClient::ChangeStatus status 
) [protected, virtual]

Intercept people changing our ModelParam.

See ModelComponent.H; as parsing the command-line or reading a config file sets our name, we'll also here instantiate a controller of the proper type (and export its options)

Reimplemented from ModelComponent.

Definition at line 79 of file Beowulf.C.

References OModelParam< T >::getVal(), isMaster, itsSlaveNames, ModelComponent::registerOptionedParam(), and ModelComponent::unregisterParam().

bool Beowulf::receive ( int &  node_nb,
TCPmessage msg,
int32 frame,
int32 action,
const int  timeout = 0,
int *  err = 0 
)

Receive message from a given node (or from any node).

Check whether a message has been received; returns false otherwise. If a message was received, the node it came from will be in node_nb, and its frame and action fields will be pre-decoded for convenience (they still are in the message itself too). This method is always non-blocking, i.e., it returns immediately and does not wait for messages to come in.

Parameters:
node_nb node number to receive from, or -1 to receive from any node. If a message is received, node_nb will be updated to the node number from which the message was received, or -1 if it was received from the master.
msg received message
frame frame number from received message
action action field from received message
timeout if non-zero, max time (in ms) this call may block
err if non-null, then (*err) will be set to non-zero if an error occurs

Definition at line 536 of file Beowulf.C.

References ASSERT, TCPmessage::getAction(), TCPmessage::getETI(), TCPmessage::getID(), and Timer::getSecs().

void Beowulf::releaseNode ( int  nodenum  ) 

De-allocate a currently allocated node.

Definition at line 152 of file Beowulf.C.

int Beowulf::requestNode (  ) 

Request a so-far unallocated node.

This will return the next node number that has not yet been requested. This only works if we are Beowulf master. It will generate an error message and return -2 if we have no more unallocated nodes. We need to be start()'ed for this to work.

Definition at line 134 of file Beowulf.C.

References ModelComponent::started().

void Beowulf::resetConnections ( const int  keepfd = -1  ) 

Reset and kill all connections except possibly one (keepfd).

Resets the Beowulf to uninitialized state. Kills all connections, except possibly one (typically towards the master) that may be specified as argument.

Parameters:
keepfd the fd to keep (or -1 to keep no fd)

Definition at line 165 of file Beowulf.C.

References Timer::reset().

void Beowulf::send ( TCPmessage msg  ) 

Send message to the least-loaded of our slave nodes.

This method is non-blocking (returns immediately). A copy of msg is taken, so you can destroy it immediately after send. Only works if we are the Beowulf master, fatal error otherwise. This implements load balancing. The ETI (estimated time to idle) fields in TCPmessage are used to determine which of our slave nodes has the shortest pending work queue (i.e., shortest ETI) and the message will be sent to that node. Thus, this functionality assumes that every slave node can process every message that you might send them (as opposed to more constrained architectures where a given node is only capable of doing a given type of processing corresponding to a given type of received message). For this load balancing to work, the slaves should try to put good-faith estimates of their time to idle (in seconds) each time they send us (the master) a message back. The master relies on those good-faith estimates to decide which node is the least loaded. This approach has severe limitations if your overall message traffic is low, as your ETI estimates at the master will not be refreshed regularly and may become grossly inaccurate. Thus, this approach is mostly intended for streaming applications, where every node will usually send several messages back to the master every 30ms or so, so that the ETI estimates collected at the master will be reasonably fresh and accurate. If several slave nodes have the lowest ETI, one will be picked at random.

Definition at line 499 of file Beowulf.C.

References ASSERT, TCPmessage::getAction(), TCPmessage::getID(), and randomDouble().

void Beowulf::send ( const int  node_nb,
TCPmessage msg 
)

Send message to another node.

This method is non-blocking (returns immediately). A copy of msg is taken, so you can destroy it immediately after send.

Parameters:
node_nb is the destination node number. A value of -1 on a slave Beowulf means that msg should be sent to the Beowulf master.

Definition at line 454 of file Beowulf.C.

References ASSERT, TCPmessage::getAction(), TCPmessage::getID(), OModelParam< T >::getVal(), selfdroplast, and selfqlen.


Member Data Documentation

OModelParam<double> Beowulf::initTimeout [protected]

max time to wait for initialization

Definition at line 210 of file Beowulf.H.

OModelParam<bool> Beowulf::isMaster [protected]

true if we are the master

Definition at line 207 of file Beowulf.H.

Referenced by Beowulf(), and paramChanged().

names of our slaves as a space-separated list of hostname:port

port is optional and we will use the default SockServ port if unspecified. This parameter is only used if we are Beowulf master (see constructor)

Definition at line 205 of file Beowulf.H.

Referenced by Beowulf(), and paramChanged().

self-message queue drop policy

Definition at line 209 of file Beowulf.H.

Referenced by send().

OModelParam<int> Beowulf::selfqlen [protected]

self-message queue length

Definition at line 208 of file Beowulf.H.

Referenced by send().


The documentation for this class was generated from the following files:
Generated on Sun May 8 08:43:08 2011 for iLab Neuromorphic Vision Toolkit by  doxygen 1.6.3