Parallelize¶
- class parallelformers.parallelize.parallelize(model: torch.nn.modules.module.Module, fp16: bool, num_gpus: int, custom_policies=None, master_addr: str = '127.0.0.1', master_port: int = 29500, backend='nccl', verbose: Optional[str] = None, init_method='spawn', daemon: bool = True)[source]¶
Bases:
object
Parallelformers end-point function
- Parameters
model (nn.Module) – Huggingface pre-trained transformer model.
fp16 – (bool): whether use FP16 or not.
num_gpus (int) – number of GPU for parallelization.
master_addr (str) – master process address for process communication (default=’127.0.0.1’)
master_port (int) – master process port for process communication (default=29500)
backend (str) – distributed backend (default=’nccl’)
verbose (bool) – logging current gpu states (one of [‘detail’, ‘simple’, None]
init_method (str) – multiprocess initialization method. (It is safe to set init_method to spawn.)
daemon (bool) – whether make process daemon or not (default=True)
Notes
We want to use this object as a simple function rather than a class. So we broke the PEP8 rules and set class names that start with a lowercase letter.
Examples
>>> # 1. Import Huggingface and Parallelformers modules >>> from transformers import AutoModelForCausalLM, AutoTokenizer >>> from parallelformers import parallelize
>>> # 2. Create Huggingface model and tokenizer. >>> model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-2.7B") >>> tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-2.7B")
>>> # 3. Parallelize using Parallelformers. >>> parallelize(model, num_gpus=4, fp16=True)
>>> # 4. Do inference as usual. >>> inputs = tokenizer("Parallelformers is", return_tensors="pt") >>> outputs = model.generate(**inputs, num_beams=5, no_repeat_ngram_size=4) >>> print(f"Output: {tokenizer.batch_decode(outputs)[0]}") 'Output: Parallelformers is an open-source library for parallel programming ...'
- preprocess_for_wav2vec(model: torch.nn.modules.module.Module) None [source]¶
There is one missing parameter in the Huggingface Wav2Vec model, and the user loses control over this parameter, making parallelization impossible.
To use multiprocessing, if requries_grad is True, it must be a leaf tensor, but this tensor (conv.weight) does not follow this rule, so multiprocessing becomes impossible. Therefore, we detach this parameter to enable multiprocessing.
- Parameters
model (nn.Module) – model weight
- init_environments(num_gpus: int, master_addr: str, master_port: int) None [source]¶
Initialize environment variables
- register_hijack_methods(method: str) None [source]¶
Intercept the flow by changing some methods (e.g. forward, generate, …) in the model to self.hijack methods.
- Parameters
method (str) – name of method
- register_memory_methods(method: str) None [source]¶
Add several methods to check GPU occupancy status of models located in other process.
- Parameters
method (str) – name of method
- deparallelize() None [source]¶
Remove all methods registered in the model and join all GPU processes to main process.
- hijack(inputs: Any, kwargs: Dict, func: str) Any [source]¶
Transfers the input passed to the main process to another process, and transfers the output to the main process and outputs it to the user.
- Parameters
inputs (Any) – inputs of model
kwargs (Dict) – arguments of model
func (str) – name of method that hijacked
- Returns
outputs of model
- Return type
Any