Here is an article about how to train a
PyTorch based Deep Learning model using multiple
GPU devices across multiple nodes on an HPC cluster:
https://tuni-itc.github.io/wiki/Technical-Notes/Distributed_dataparallel_pytorch/
There are a few bugs in the sample code in the article. Make sure to change the following lines of code:
model = AE(input_shape=784).cuda(args.gpus)
model = torch.nn.parallel.DistributedDataParallel( model_sync, device_ids=[args.gpu], find_unused_parameters=True )
to
model = AE(input_shape=784).cuda(args.gpu)
model = torch.nn.parallel.DistributedDataParallel( model, device_ids=[args.gpu], find_unused_parameters=True )