Distributed Utils¶
- class parallelformers.utils.dist_utils.ParallelModule[source]¶
Bases:
torch.nn.modules.module.Module
Parents of all parallel layer classes
- class parallelformers.utils.dist_utils.AllReduceLinear(in_features: int, out_features: int, bias: bool = True)[source]¶
Bases:
torch.nn.modules.linear.Linear
,parallelformers.utils.dist_utils.ParallelModule
All-reduce linear layer
- forward(input: torch.Tensor) torch.Tensor [source]¶
- weight: torch.Tensor¶
- class parallelformers.utils.dist_utils.AllReduceConv1D(nf, nx)[source]¶
Bases:
transformers.modeling_utils.Conv1D
,parallelformers.utils.dist_utils.ParallelModule
All-reduce convolution 1D layer for GPT models
- class parallelformers.utils.dist_utils.AllReduceQuantLinear(in_features, out_features, bias=True, weight_bit=8, bias_bit=32, per_channel=False, quant_mode=False)[source]¶
Bases:
transformers.models.ibert.quant_modules.QuantLinear
,parallelformers.utils.dist_utils.ParallelModule
All-reduce quantized linear layer for IBert models