Distributed Utils

class parallelformers.utils.dist_utils.ParallelModule[source]

Bases: torch.nn.modules.module.Module

Parents of all parallel layer classes

allreduce(outputs)[source]
training: bool
class parallelformers.utils.dist_utils.AllReduceLinear(in_features: int, out_features: int, bias: bool = True)[source]

Bases: torch.nn.modules.linear.Linear, parallelformers.utils.dist_utils.ParallelModule

All-reduce linear layer

forward(input: torch.Tensor) torch.Tensor[source]
in_features: int
out_features: int
weight: torch.Tensor
class parallelformers.utils.dist_utils.AllReduceConv1D(nf, nx)[source]

Bases: transformers.modeling_utils.Conv1D, parallelformers.utils.dist_utils.ParallelModule

All-reduce convolution 1D layer for GPT models

forward(x)[source]
training: bool
class parallelformers.utils.dist_utils.AllReduceQuantLinear(in_features, out_features, bias=True, weight_bit=8, bias_bit=32, per_channel=False, quant_mode=False)[source]

Bases: transformers.models.ibert.quant_modules.QuantLinear, parallelformers.utils.dist_utils.ParallelModule

All-reduce quantized linear layer for IBert models

allreduce_linear_layer(input, weight, bias=None)[source]
forward(x, prev_act_scaling_factor=None)[source]
training: bool