Checkpointing is the process of saving the execution state of an application such that this saved state can be used to continue the execution at a later time. Typically, the execution state is written to a file. Restart is the step that comes after checkpointing and helps in resuming the application from the saved state.
Checkpointing not only saves time by offering the capability to resume the execution of an application in case of a hardware failure, but it also helps in overcoming the time-limits associated with the different job queues/partitions. Following are some of the approaches in which an application can be made to write checkpoints: