Parallel Engine

class parallelformers.parallel.engine.ParallelEngine(num_gpus: int, backend: str, custom_policies: Union[parallelformers.policies.base.policy.Policy, List[parallelformers.policies.base.policy.Policy]])[source]

Bases: object

Model parallelization processing engine

Parameters
  • num_gpus (int) – number of gpus

  • backend (str) – distributed backend (default=nccl)

  • custom_policies (Union[Policy, List[Policy]]) – user customized policy objects

Notes

The parallelization process is performed through the following process.

  1. slice parallelizable tensors and replace original tensors on CPU

  2. upload parallelizable (replaced) tensors to multiple GPUs simultaneously

  3. upload non-parallelizable tensors to multiple GPUs (e.g. embedding, lm_head, …)

parallelize(model: torch.nn.modules.module.Module, fp16: bool) torch.nn.modules.module.Module[source]

Parallelize model to multiple GPUs

Parameters
  • model (nn.Module) – Huggingface pre-trained transformer model.

  • fp16 – (bool): whether use FP16 or not.

Returns

parallelized model

Return type

nn.Module

create_process_group(backend: str)[source]

Create Pytorch distributed process group

Parameters

backend (str) – distributed backend

Returns

process group for parallization

Return type

ProcessGroupNCCL