ScalableSoftmax

An unofficial PyTorch implementation of Scalable-Softmax (Ssmax) from the paper "Scalable-Softmax Is Superior for Attention" (Nakanishi, 2025).

Overview

ScalableSoftmax is a drop-in replacement for standard Softmax that helps prevent attention fading in transformers by incorporating input size scaling. This helps maintain focused attention distributions even with large input sizes.

Installation

pip install scalable-softmax

Usage

import torch
from scalable_softmax import ScalableSoftmax

# Initialize with default parameters
ssmax = ScalableSoftmax()

# Or customize parameters
ssmax = ScalableSoftmax(
    s=0.43,  # scaling parameter
    learn_scaling=True,  # make scaling parameter learnable
    bias=False  # whether to use bias term
)

# Apply to input tensor
x = torch.randn(batch_size, sequence_length)
output = ssmax(x)

Features

Drop-in replacement for standard softmax
Learnable scaling parameter
Optional bias term
Maintains focused attention with large inputs

Citation

@article{nakanishi2025scalable,
  title={Scalable-Softmax Is Superior for Attention},
  author={Nakanishi, Ken M.},
  journal={arXiv preprint arXiv:2501.19399},
  year={2025}
}

License

MIT License

Name	Name	Last commit message	Last commit date
Latest commit gdevos010 Update README.md Feb 8, 2025 1b5a6a0 · Feb 8, 2025 History 7 Commits
.github/workflows	.github/workflows	remove test	Feb 8, 2025
src	src	initial commit	Feb 8, 2025
.gitignore	.gitignore	initial commit	Feb 8, 2025
LICENSE	LICENSE	initial commit	Feb 8, 2025
README.md	README.md	Update README.md	Feb 8, 2025
pyproject.toml	pyproject.toml	0.1.1	Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ScalableSoftmax

Overview

Installation

Usage

Features

Citation

License

About

Releases 1

Packages

Languages

License

gdevos010/Scalable-Softmax

Folders and files

Latest commit

History

Repository files navigation

ScalableSoftmax

Overview

Installation

Usage

Features

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages