class parallelformers.parallelize.parallelize(model: torch.nn.modules.module.Module, fp16: bool, num_gpus: int, custom_policies=None, master_addr: str = '', master_port: int = 29500, backend='nccl', verbose: Optional[str] = None, init_method='spawn', daemon: bool = True)[source]

Bases: object

Parallelformers end-point function

  • model (nn.Module) – Huggingface pre-trained transformer model.

  • fp16 – (bool): whether use FP16 or not.

  • num_gpus (int) – number of GPU for parallelization.

  • master_addr (str) – master process address for process communication (default=’’)

  • master_port (int) – master process port for process communication (default=29500)

  • backend (str) – distributed backend (default=’nccl’)

  • verbose (bool) – logging current gpu states (one of [‘detail’, ‘simple’, None]

  • init_method (str) – multiprocess initialization method. (It is safe to set init_method to spawn.)

  • daemon (bool) – whether make process daemon or not (default=True)


We want to use this object as a simple function rather than a class. So we broke the PEP8 rules and set class names that start with a lowercase letter.


>>> # 1. Import Huggingface and Parallelformers modules
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from parallelformers import parallelize
>>> # 2. Create Huggingface model and tokenizer.
>>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B")
>>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
>>> # 3. Parallelize using Parallelformers.
>>> parallelize(model, num_gpus=4, fp16=True)
>>> # 4. Do inference as usual.
>>> inputs = tokenizer("Parallelformers is", return_tensors="pt")
>>> outputs = model.generate(**inputs, num_beams=5, no_repeat_ngram_size=4)
>>> print(f"Output: {tokenizer.batch_decode(outputs)[0]}")
'Output: Parallelformers is an open-source library for parallel programming ...'
preprocess_for_wav2vec(model: torch.nn.modules.module.Module) None[source]

There is one missing parameter in the Huggingface Wav2Vec model, and the user loses control over this parameter, making parallelization impossible.

To use multiprocessing, if requries_grad is True, it must be a leaf tensor, but this tensor (conv.weight) does not follow this rule, so multiprocessing becomes impossible. Therefore, we detach this parameter to enable multiprocessing.


model (nn.Module) – model weight

init_environments(num_gpus: int, master_addr: str, master_port: int) None[source]

Initialize environment variables

  • num_gpus (int) – number of GPU for parallelization.

  • master_addr (str) – master process address for process communication

  • master_port (int) – master process port for process communication

register_hijack_methods(method: str) None[source]

Intercept the flow by changing some methods (e.g. forward, generate, …) in the model to self.hijack methods.


method (str) – name of method

register_memory_methods(method: str) None[source]

Add several methods to check GPU occupancy status of models located in other process.


method (str) – name of method

deparallelize() None[source]

Remove all methods registered in the model and join all GPU processes to main process.

parallelize() None[source]

Create processes for model parallelization and parallel inference

hijack(inputs: Any, kwargs: Dict, func: str) Any[source]

Transfers the input passed to the main process to another process, and transfers the output to the main process and outputs it to the user.

  • inputs (Any) – inputs of model

  • kwargs (Dict) – arguments of model

  • func (str) – name of method that hijacked


outputs of model

Return type