Learn more about Platform products at http://www.platform.com



Tutorial 2: Request Host Allocation in a Cluster with Synchronous Notifications

This tutorial describes how to create a registered EGO client that requests host allocation in a cluster and starts a container on the host. The client also reads notifications synchronously from the cluster regarding resource changes.

Using this tutorial, you will ...


Step 1: Preprocessor directives

The first step is to include a reference to the system and API header files. The samples.h header file contains the declaration of methods that are implemented in the samples.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
#include "vem.api.h"
#include "samples.h"


Step 2: Implement the principal method

Lines 4-8: define and initialize a data structure that is used to request a connection with the EGO host cluster. The data structure contains a reference to a configuration file where the master host name and port numbers are stored.

Line 10: pass the data structure as an argument to the vem_open () method, which opens a connection to the master host. If the connection attempt is successful, a handle is returned; otherwise the method returns NULL. The handle acts as a communication channel to the master host and all subsequent communication occurs through this handle.

Lines 18-19: the vem_name_t structure is initialized with NULL. This structure holds the cluster name, system name, and version. The vem_uname () method is passed the communication handle and, if successful, returns a valid vem_name_t structure (defined as clustername); otherwise the method returns NULL.

Line 26: the cluster info is printed out to the screen.

Lines 29-46: define the client info structure. Use vem_locate() to get all registered clients. Since NULL is provided as the client name, all registered clients will be located and the method returns the number of registered clients. Note that Platform EGO is equipped with a number of default clients (services) such as the Service Controller, so as a minimum, the info relevant to these clients is printed out and the associated memory is released.

Lines 47-49: authenticate the user to Platform EGO.

1	 int 
2	 sample2() 
3	 {
4	   vem_openreq_t orequest;
5	   vem_handle_t *vhandle = NULL;
6	 
7	   orequest.file = "ego.conf"; // default libvem.conf
8	   orequest.flags=0;
9	 
10	   vhandle = vem_open(&orequest);
11	 
12	   if (vhandle == NULL) {
13	   	 // error opening
14	    	 fprintf(stderr, "Error opening cluster: %s\n",  vem_strerror(vemerrno));
15	    	 return -1;
16	   }
17	 
18	   vem_name_t *clusterName = NULL;
19	   clusterName = vem_uname(vhandle);
20	   if (clusterName == NULL) {
21	   	 // error connecting
22	    	 fprintf(stderr, "Error connecting to cluster: %s\n",  vem_strerror(vemerrno));
23	    	 return -2;
24	   }
25	   
26	   fprintf(stdout, " Connected... %s %s %4.2f\n", clusterName->clustername,
27	  clusterName-	 	 >sysname, clusterName->version);
28	 
29	   vem_clientinfo_t *clients;
30	   int  rc = vem_locate(vhandle, NULL, &clients); 
31	   if (rc >=0) {
32	     if (rc == 0) {
33	    	   printf("No registered clients exist\n");
34	     } else {
35	   	   int i=0;
36	   	   for (i=0; i<rc; i++) {
37	    	     printf("%s %s %s\n", clients[i].name, clients[i].description,
38	    	     clients[i].location);
39	   	   }
40	   	   // free
41	   	   vem_clear_clientinfo(clients);  	   
42	     }
43	   } else {
44	   	 // error connecting
45	    	 fprintf(stderr, "Error geting clients: %s\n",  vem_strerror(vemerrno));
46	   }
47	 } if (login(vhandle, username, password)<0) {
48	    	 fprintf(stderr, "Error logon: %s\n",  vem_strerror(vemerrno));
49	   }

Lines 50-63: define the vem_allocation_info_reply_t and vem_container_info_reply_t structures. If a client gets disconnected and then re-registers, its existing allocations and containers are returned to these structures. If the client had never registered before, the structures would be empty. Define and initialize a structure (rreq) that holds client info for registration purposes. Note that on line 58, the callback member (cb) is set to NULL. This means that it is the client's responsibility to periodically check the open connection via vem_select()/vem_read() to get incoming messages and take action accordingly. Register with Platform EGO via the open connection using vem_register().

Lines 64-67: print out information related to the allocation requests and containers. Once the info is printed out, the memory for the allocations is freed.

Lines 73-79: the method collects the information for the requested hostgroup. In this case, the requested hostgroup in the input argument is set to NULL, which means that information about all hostgroups is requested. If the method call is successful, hostgroup information is printed out to the screen.

50	  vem_allocation_info_reply_t aireply;
51	   vem_container_info_reply_t  cireply;
52	   vem_registerreq_t rreq;
53	 
54	   rreq.name = "sample2_client";
55	   rreq.description = "Sample2";
56	   rreq.flags = VEM_REGISTER_TTL;
57	   rreq.ttl = 3;
58	   rreq.cb = NULL; // would need to read messages explicitly;
59	   
60	   rc = vem_register(vhandle, &rreq, &aireply, &cireply);
61	   if (rc < 0) {
62	     	 fprintf(stderr, "Error registering: %s\n",  vem_strerror(vemerrno));  	 
63	   }
64	 print_vem_allocation_info_reply(&aireply);
65	   print_vem_container_info_reply(&cireply);
66	   // freeup any previous allocations
67	   release_vem_allocation(vhandle, &aireply); 
68	   
69	   vem_hostgroupreq_t hgroupreq;
70	   hgroupreq.grouplist = NULL;
71	   vem_hostgroup_t *hgroup;
72	 
73	   rc = vem_gethostgroupinfo(vhandle, &hgroupreq, &hgroup); 
74	   if (rc < 0) {
75	     fprintf(stderr, "Error getting hostgroup: %s\n",  vem_strerror(vemerrno));  	 
76	   } else {
77	   	 printf("%s %s %d %d\n", hgroup->groupName, hgroup->members, hgroup->free,
78	  hgroup->allocated);
79	   }

Lines 80-101: initialize the data structure (vem_allocreq_t) that specifies the allocation request. Method vem_alloc() requests resource allocation using the allocation request info (vem_allocreq_t structure) as one of the input arguments. If the request is successful, the allocation ID is printed out to the screen.

80	   vem_allocreq_t areq;
81	   areq.name = "Sample2Alloc";
82	   areq.consumer = "/SampleApplications/EclipseSamples"; 
83	   areq.hgroup = "ComputeHosts";
84	 #ifndef WIN32_RESOURCE
85	   areq.resreq = "LINUX86";
86	 #else
87	   areq.resreq = "NTX86";
88	 #endif
89	   areq.minslots = 1;
90	   areq.maxslots = 1;
91	   areq.tile = 0;
92	 vem_allocation_id_t alocid;
93	   vem_allocfreereq_t afree;
94	   rc = vem_alloc(vhandle, &areq, &alocid);
95	   if (rc < 0) {
96	     	 fprintf(stderr, "Error allocating: %s\n",  vem_strerror(vemerrno));  
97	     	 goto bailout;
98	     	 	 
99	   } else {
100	     printf("allocated: %s\n",  alocid);  	 
101	   }

Lines 102-123: define and initialize a container specification including the setting of its resource limits to default values. The container specification essentially defines a job that the user wants to be executed. The conspec.command method specifies the actual binary that should be executed. In the sample, we want the program "sleep" to be executed. The UNIX sleep command takes the number of seconds to sleep as an input argument.

102	  vem_container_spec_t conspec;
103	   memset(&conspec, 0, sizeof(vem_container_spec_t));
104	 #ifndef WIN32_RESOURCE
105	   conspec.command = "sleep 240";
106	   conspec.execUser = "lsfadmin"; // "egoadmin";
107	   conspec.umask = 0777;
108	   conspec.execCwd = "/tmp";
109	   conspec.envC = 0;
110	 #else
111	   // sleep needs to be installed on the cluster NT hosts 
112	   // or if ping is available, use something like ping -n xxx 127.0.0.1 > nul
113	   conspec.command = "sleep 240";
114	   conspec.execUser = "lsf\\lsfadmin"; //"egouser"; // "lsfadmin"; // "egoadmin";
115	   conspec.umask = 0777;
116	   conspec.execCwd = "c:\\";
117	   conspec.envC = 0;
118	 #endif
119	   int i;
120	   for (i=0; i<VEM_RLIM_NLIMITS; i++) {
121	   	 conspec.rlimits[i].rlim_cur = VEM_RLIM_DEFAULT;
122	     conspec.rlimits[i].rlim_max = VEM_RLIM_DEFAULT;
123	   }

Lines 124-130: define and initialize various structures and assign container and allocation IDs.

Lines 132-163: check to see if there is any incoming data on an open connection for up to 60 seconds (configurable timeout). If successful, the message is read from the open connection. A switch statement is used to interpret the message code enumeration and the corresponding message is printed out to the screen. If the message cannot be read, free the memory for the allocation ID.

124	 	 vem_startcontainerreq_t conreq;
125	   vem_container_id_t      conid = NULL;
126	   conreq.allocId = alocid;
127	 struct timeval tv;
128	   struct vem_message msg;
129	   struct vem_allocreply *rep = NULL;
130	   struct vem_allocreclaim *reclaim = NULL;
131	  
132	   tv.tv_sec = 60; // 60 seconds timeout   
133	   rc = vem_select(vhandle, &tv);
134	   if(rc < 0) {
135	      printf("vem_select error\n");
136	      goto cleanup;
137	   }
138	   if(rc == 0) {
139	      printf("vem_select may have problem, please set longer timeout \n");
140	      goto cleanup;
141	   }
142	   rc = vem_read(vhandle, &msg);
143	   if(rc < 0) {
144	      printf("Read message failed\n");
145	      goto cleanup;
146	   }
147	   switch(msg.code) {
148	      case RESOURCE_ADD:
149	           rep = (struct vem_allocreply *)msg.content;
150	           printf("Got alloc reply for %s %d hosts\n", rep->consumer, rep->nhost);
151	           break;
152	      case RESOURCE_RECLAIM:
153	           reclaim = (struct vem_allocreclaim*)msg.content;
154	           printf("vem wants its resources back for allocation %s\n",
155	  reclaim->reclaim->consumer);
156	           rc = -1;
157	           goto cleanup;
158	           break;
159	      default:
160	           printf("unknown message code %d\n", msg.code);
161	           goto cleanup;
162	           break;
163	   } /* switch() */

Lines 164-168: get the hostname for the allocation and print it out to the screen. Initialize the workload container request structure (conreq) with the hostname, container name, and the container specification (conspec).

Lines 170-175: start the workload container on the specified host and, if successful, print out the container ID.

Lines 178-193: use vem_locate() to get all registered clients. Since NULL is provided as the client name, all registered clients will be located and the method returns the number of registered clients. Note that Platform EGO is equipped with a number of default clients (services) such as the Service Controller, so as a minimum, the info relevant to these clients is printed out and the associated memory is released. If successful, print out the client info and free the associated memory.

164	 char *host = rep->host[0].name;
165	   printf("Allocated host: %s\n",  host);  	 
166	   conreq.hostname = host;
167	   conreq.name = "Sample2Container";
168	   conreq.spec = &conspec;
169	   
170	   rc = vem_startcontainer(vhandle, &conreq, &conid);
171	   if (rc < 0) {
172	     	 fprintf(stderr, "Error starting container: %s\n",  vem_strerror(vemerrno));
173	     	 goto cleanup;
174	   }
175	   printf("Started container %s\n", conid);
176	 // Currently no way to get container from id.
177	   //print_vem_container(vem_container_t *container);
178	 rc = vem_locate(vhandle, NULL, &clients); 
179	   if (rc >=0) {
180	     if (rc == 0) {
181	    	   printf("No registered clients exist\n");
182	     } else {
183	   	   int i=0;
184	   	   for (i=0; i<rc; i++) {
185	    	     printf("%s %s %s\n", clients[i].name, clients[i].description,
186	    	     clients[i].location);
187	   	   }
188	       vem_clear_clientinfo(clients);  
189	     }
190	   } else {
191	   	 // error connecting
192	    	 fprintf(stderr, "Error geting clients: %s\n",  vem_strerror(vemerrno));
193	   }
194	 // wait for job to finish
195	 #ifdef WIN32
196	     Sleep(60000);
197	 #else
198	     sleep(30);
199	 #endif

200	 cleanup:
201	   afree.allocId = alocid;
202	   rc = vem_allocfree(vhandle, &afree);
203	   if (rc < 0) {
204	     	 fprintf(stderr, "Error freeing allocation: %s\n",  vem_strerror(vemerrno));  	 
205	   }
206	 bailout: 
207	   rc = vem_unregister(vhandle);
208	   if (rc < 0) {
209	     	 fprintf(stderr, "Error unregistering: %s\n",  vem_strerror(vemerrno));  	 
210	   }
211	 if (logout(vhandle)<0) {
212	    	 fprintf(stderr, "Error logoff: %s\n",  vem_strerror(vemerrno));
213	   }
214	 // free memory
215	   vem_free_containerId(conid);
216	   //vem_free_containerSpec(&conspec);	   // crashes
217	 
218	 leave:
219	   vem_free_uname(clusterName);    
220	   vem_close(vhandle);
221	 if(host != NULL) 
222	   	 free(host);
223	   	 
224	   return 0;
225	 } 


Step 3: Free all resource allocations

This method iterates through each allocation, as identified by its allocation ID, and frees its memory. Freeing an allocation is the same as cancelling it, i.e., all resources associated with the allocation are released.

void 
release_vem_allocation(vem_handle_t *vhandle, vem_allocation_info_reply_t *aireply) 
{
	 int i;
	 for(i=0; i<aireply->nallocation; i++){
	   // free alocid memory
      vem_allocfreereq_t afree;
      afree.allocId = aireply->allocation[i].allocId;
      int rc = vem_allocfree(vhandle, &afree);
      if (rc < 0) {
    	 fprintf(stderr, "Error freeing allocation: %s\n",  vem_strerror(vemerrno));  	 
      }
	 }	 	 
}


Step 4: Print allocation info

These three methods iterate through each allocation, printing out the allocation ID, allocation request info, host name, host slots, and a list of host attributes.

void
print_vem_allocation_info_reply(vem_allocation_info_reply_t *aireply)
{
	 int i;
	 for(i=0; i<aireply->nallocation; i++){
	 	 print_vem_allocation(&aireply->allocation[i]);
	 }	 
}

void
print_vem_allocation(vem_allocation_t *alloc) 
{
	 printf("AllocId=%s\n", alloc->allocId);
	 print_vem_allocreq(alloc->allocReq);
	     int i, j;
    for(i=0; i<alloc->nhost; i++){
    	 printf("Name=%s Slots=%d Attributes ", 
    	   alloc->host[i].name,
    	   alloc->host[i].slots);
    	 for(j=0; j<alloc->hostattr[i].attrC; j++){
    	   vem_attribute_t *attr = &alloc->hostattr[i].attrV[j];
    	   printf("%s=", attr->name);
    	   print_vem_value(&attr->value_t);
    	 }
    	 printf("\n");
    } 
}
void
print_vem_allocreq(vem_allocreq_t *allocreq) 
{
	 printf("AllocReq %s %s %s %s %d %d %d\n", 
	 allocreq->name,
	 allocreq->consumer,
	 allocreq->hgroup,
	 allocreq->resreq,
	 allocreq->maxslots,
	 allocreq->minslots,
	 allocreq->flags
	 );
}


Step 5: Print container info

These four methods iterate through each container, printing out the container ID, state, and other container-related fields. The print_vem_container_state () and print_vem_container_exit_reason methods () use switch statements to interpret the meaning of the enumeration members.

void
print_vem_container_info_reply(vem_container_info_reply_t  *cireply)
{
	 int i;
	 for(i=0; i<cireply->ncontainer; i++){
	 	 print_vem_container(cireply->container);
	 }	 
}
void
print_vem_container(vem_container_t *container) 
{
	 printf("Container\n");
	 printf("Id=%s\nState=",	 container->id);
	 print_vem_container_state(container->state);
	 printf("\nName=%s\nAllocId=%s\nConsumer=%s Start=%ld, End=%ld\nHost=%s ExitStatus=%d
	 ExitReason=",
	   container->name,
	   container->allocId,
	   container->consumer,
	   container->startTime,
	   container->endTime,
	   container->host,
	   container->exitStatus);
	 print_vem_container_exit_reason(container->exitReason);
	 //TODO add the rest of the fields
	 // print rest	 
}
void
print_vem_container_state(vem_container_state_t state)
{
	 switch(state) {
      case CONTAINER_NULL:       printf(" 0, internal state"); break; 
      case CONTAINER_START:      printf(" 1, start"); break; 
      case CONTAINER_RUN:        printf(" 2, running"); break; 
      case CONTAINER_SUSPEND:    printf(" 3, suspend"); break; 
      case CONTAINER_FINISH:     printf(" 4, finish"); break; 
      case CONTAINER_UNKNOWN:    printf(" 5, unknown, host unreachable "); break; 
      case CONTAINER_ZOMBIE:     printf(" 6, zombie, unknown container is terminated");
 break; 
      case CONTAINER_MAX_STATE:  printf(" Number of container state"); break; 
	 }
}

void
print_vem_container_exit_reason (vem_container_exit_reason_t rcode)
{
	 switch(rcode) {
      case ER_NULL:                  printf("  0, no reason"); break;                       
      case ER_SETUP_NO_MEM:          printf("  1, exit bacause of setup fail");break;
      case ER_SETUP_FORK:            printf("  2, fork fail");break;
      case ER_SETUP_PGID:            printf("  3, fail to setpgid"); break; 
      case ER_SETUP_ENV:             printf("  4, fail to set env variables");break;
      case ER_SETUP_LIMIT:           printf("  5, fail to set process limits");break;
      case ER_SETUP_NO_USER:         printf("  6, user account doesn't exist");break;
      case ER_SETUP_PATH:            printf("  7, fail to change container cwd");break;
      case ER_SIG_KILL:              printf("  8, terminated by sigkill");break;
      case ER_UNKNOWN:               printf("  9, unknown reason ");break;
      case ER_PEM_UNREACH:           printf("  10, fail to reach pem host");break;
      case ER_PEM_SYN:               printf("  11, vemkd and pem sync issue");break;
	 	 case ER_BAD_ALLOC_HOST:        printf("  14, host is not allocated");break;
      case ER_NOSUCH_CLIENT:         printf("  15, client doesn't exist");break;
      case ER_START:                 printf("  16, container start fails");break;
      case LAST_EXIT_REASON:         printf(" last exit reason ");break;
	 }
	 printf("\n");
} 


Run the client application

  1. Select Run > Run.

    The Run dialog appears.

  2. In the Configurations list, either select an EGO C Client Application or click New for a new configuration.

    For a new configuration, enter the configuration name.

  3. Enter the project name and C/C++ Application name.
  4. Click Apply and then Run.

Sample Output

[ Top ]


[ Platform Documentation ]


      Date Modified: July 12, 2006
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2006 Platform Computing Corporation. All rights reserved.