In the previous examples, the checkpoint action is controlled by the coordinator, either by -i number_of_second option or by manually type in 'c' in the coordinator screen if the coordinator is not launched as a daemon in a separate shell. The checkpoint action can occur at any point during the execution of an application. In some cases, users may want to control when the checkpoint actions take place during the execution. Users can achieve this goal by embedding some DMTCP routines. Here is an example how a C program with embedded checkpoint-and-restart:
#include <stdlib.h>
#include <assert.h>
#include <stdio.h>
/* Be sure to compile with -I<path>; see Makefile in this directory. */
#include "dmtcp.h"
#define INTS_PER_LOOP 5
// Prints a sequence of n integers starting 0 to both the screnn and the file out.txt
// at a rate of 1 character integer second.
// Checkpoint occurs every INTS_PER_LOOP iterations
int main(int argc, char* argv[])
{
unsigned long i = 0;
int count = 0;
int rr;
int numCheckpoints, numRestarts;
FILE *f;
f = fopen("out.txt","w");
while (i<100)
{
if(dmtcp_is_enabled()){
dmtcp_get_local_status(&numCheckpoints, &numRestarts);
printf("on iteration %d: this process has checkpointed"
" %d times and restarted %d times\n",
++count, numCheckpoints, numRestarts);
}else{
printf("on iteration %d; DMTCP not enabled!\n", ++count);
}
do {
printf("%d ", i);
fflush(stdout);
fprintf(f, "%d\n",i);
fflush(f);
sleep(1);
i++;
} while (i % INTS_PER_LOOP != 0);
printf("\n");
// Checkpoint and print result
if(dmtcp_is_enabled()){
printf("\n");
rr = dmtcp_checkpoint();
if(rr == DMTCP_NOT_PRESENT)
printf("***** Error, DMTCP not running; checkpoint skipped ***** \n");
if(rr == DMTCP_AFTER_CHECKPOINT)
printf("***** after checkpoint *****\n");
if(rr == DMTCP_AFTER_RESTART)
printf("***** after restart *****\n");
}else{
printf(" dmtcp disabled -- nevermind\n");
}
}
fclose(f);
return 0;
}
Here is the Makefile for compiling:
ifndef CC
CC=gcc
endif
your-program : your-program.c
${CC} -fPIC ${CFLAGS} -I${DMTCP_HOME}/include your-program.c -o your-progam
The run the above application with DMTCP:
dmtcp_coordinator --daemon --exit-on-last
dmtcp_launch ./your-application
The output looks like below screenshot:
on iteration 1: this process has checkpointed 0 times and restarted 0 times
0 1 2 3 4
***** after checkpoint *****
on iteration 2: this process has checkpointed 1 times and restarted 0 times
5 6 7 8 9
You can use control-c to terminate the execution, and restart it form the last checkpoint as the following:
dmtcp_coordinator --daemon --exit-on-last
dmtcp_restart ckpt*
The output looks like below screenshot:
on iteration 5: this process has checkpointed 3 times and restarted 1 times
20 21 22 23 24
***** after checkpoint *****
on iteration 6: this process has checkpointed 4 times and restarted 1 times
25 26 27 28 29
-- Zhiwei - 12 Jul 2020