Policy Class

In Parallelformers, every model has its own Policy classes that manage the overall parallelization configurations. (Check this.) In most cases, you don’t have to care about them because the policies of most Hugging Face models are pre-defined in the AutoPolicy class. If you want to use a new model that is not in the AutoPolicy, you need to add a Policy class for yourself. Below are the basic syntax related to the Policy class.

Layer Class

Most methods in the Policy class return a list of Layer classes. For details of arguments of the Layer class, refer to the docs.


The following is a example of replace_arguments() method. To parallelize most models, some arguments like number of attention heads and hidden size must be changed. In this case, you can write changes of arguments in the replace_arguemnts() method. It will be applied when model start to parallelize.

# example of `replace_arguemens()` method

def replace_arguments(config, world_size):
    return {
        # 1. reduce hidden size
        "attention.self.embed_dim": config.hidden_size // world_size,
        # 2. reduce number of heads
        "attention.self.num_heads": config.num_attention_heads // world_size,


The following is a example of replace_modules() method. in some cases, parallelization is impossible due to implementation of Huggingface transformers. So we provide replace_modules() method. This allows you to change the codes of exsting layers. You can check more example in the parallelformers/transformers directory.

# example of `replace_modules()` method

def replace_modules():
    return {
        "BartAttention": BartAttention_,

Applying custom policy object

Finally, input the class of the custom policy as custom_policies argument.

from parallelformers import parallelize
from your_codes import YourPolicy

model = Model()
parallelize(model, num_gpus=4, fp16=True, custom_policies=YourPolicy)

You can also input list of multiple polices class if the model requires multiple policies objects.

from parallelformers import parallelize
from your_codes import YourEncoderPolicy, YourDecoderPolicy

model = Model()
parallelize(model, num_gpus=4, fp16=True, custom_policies=[YourEncoderPolicy, YourDecoderPolicy])