Tensor Replacer

class parallelformers.parallel.replacing.TensorReplacer(model: torch.nn.modules.module.Module, mp_group: Any, fp16: bool, num_gpus: int, custom_policies: Union[parallelformers.policies.base.policy.Policy, List[parallelformers.policies.base.policy.Policy]])[source]

Bases: object

Replace original Huggingface’s layer into Megatron tensor sliced layer.

Parameters
  • model (nn.Module) – Huggingface pre-trained transformer model

  • mp_group (Any) – process group for model parallelism

  • fp16 – (bool): Whether use FP16 or not.

  • num_gpus (int) – number of GPUs

  • custom_policies (Union[Policy, List[Policy]]) – custom policy object (default=None)

auto_policy() Optional[List[parallelformers.policies.base.policy.Policy]][source]

Find the proper policy for current model using AutoPolicy

replace_modules()[source]

Replace original huggingface layers to Megtraon tensor sliced layers

replace_user_define_modules(model: torch.nn.modules.module.Module, policy_cls: Type[parallelformers.policies.base.policy.Policy]) None[source]

Replace modules in the model by user defined policy

Parameters
  • model (nn.Module) – model weight

  • policy_cls (Type[Policy]) – class of policy

replace_orig_to_megatron_modules(model: torch.nn.modules.module.Module, policy_cls: Type[parallelformers.policies.base.policy.Policy]) torch.nn.modules.module.Module[source]

Replace original Huggingface layers to Megatron tensor sliced layers

Parameters
  • model (nn.Module) – model weight

  • policy_cls (Type[Policy]) – class of policy

Returns

parallelized paramerters

Return type

nn.Module

preprocess(function_output: List[parallelformers.policies.base.policy.Layer], policy: parallelformers.policies.base.policy.Policy) Tuple[Dict, Dict, Dict, Dict][source]

Preprocess user’s policy object to replace tensors

Parameters
  • function_output (List[Layer]) – list of layers in the policy object

  • policy (Policy) – policy object

Returns

Tuple of dictionaries of parameters and attributes required for tensor slicing

Return type

Tuple[Dict, Dict, Dict, Dict]

set_parameters(policy: parallelformers.policies.base.policy.Policy, weight_name: Dict[str, torch.Tensor], bias_name: Dict[str, torch.Tensor], weight_param: Dict[str, torch.Tensor], bias_param: Dict[str, torch.Tensor], suffix: str = 'data') parallelformers.policies.base.policy.Policy[source]

Set sliced parameters into original model

Parameters
  • policy (Policy) – policy object

  • weight_name (Tuple[str]) – names of layer’s weight

  • bias_name (Tuple[str]) – names of layer’s bias

  • weight_param (Tuple[Tensor]) – parameters of sliced weight

  • bias_param (Tuple[Tensor]) – parameters of sliced bias

  • suffix (str) – name of suffix in the parameters

Returns

policy object

Return type

Policy

static set_layer_size(policy: parallelformers.policies.base.policy.Policy, name: str, size: torch.Size) None[source]

Apply resize layer size to original layer object

Parameters
  • policy (Policy) – policy object

  • name (str) – name of parameters

  • size (Size) – size of resized parameters

make_megatron_layer(policy: parallelformers.policies.base.policy.Policy) torch.nn.modules.module.Module[source]

Make Megatron tensor sliced layers from original Huggingface layers by tensor slicing.

Parameters

policy (Policy) – policy object

Returns

sliced model layer

Return type

nn.Module