Distributed Utils¶
- class parallelformers.utils.dist_utils.ParallelModule[source]¶
Bases:
torch.nn.modules.module.ModuleParents of all parallel layer classes
- class parallelformers.utils.dist_utils.AllReduceLinear(in_features: int, out_features: int, bias: bool = True)[source]¶
Bases:
torch.nn.modules.linear.Linear,parallelformers.utils.dist_utils.ParallelModuleAll-reduce linear layer
- forward(input: torch.Tensor) torch.Tensor[source]¶
- weight: torch.Tensor¶
- class parallelformers.utils.dist_utils.AllReduceConv1D(nf, nx)[source]¶
Bases:
transformers.modeling_utils.Conv1D,parallelformers.utils.dist_utils.ParallelModuleAll-reduce convolution 1D layer for GPT models
- class parallelformers.utils.dist_utils.AllReduceQuantLinear(in_features, out_features, bias=True, weight_bit=8, bias_bit=32, per_channel=False, quant_mode=False)[source]¶
Bases:
transformers.models.ibert.quant_modules.QuantLinear,parallelformers.utils.dist_utils.ParallelModuleAll-reduce quantized linear layer for IBert models