S6
- class ssm.model.S6(model_dim, hid_dim=16, n_layers=2, activation=<class 'torch.nn.modules.activation.GELU'>, real_random=False, normalization=True, **kwargs)
Bases:
Module
Implementation of the Selective Structured State Space Sequence (S6) model.
The S6 model is designed to efficiently model long sequences using selective state space models. Its selection mechanism allows it to focus on relevant parts of the input sequence, making it suitable for tasks such as selective copy. It enables improved scalability and performance compared to classical recurrent architectures. The parallel scan algorithm is used to compute the output efficiently.
The model is composed of several S6 blocks, each followed by an activation function and a linear layer, similarly to the S4 model.
See also
Original Reference: Gu, A., Dao, T. (2024). “Mamba: Linear-Time Sequence Modeling with Selective State Spaces”. arXiv:2312.00752. DOI: <https://arxiv.org/abs/2312.00752>_.
Original Reference: Heinsen, F., A. (2023) “Efficient Parallelization of a Ubiquitous Sequential Computation”. arXiv:2311.06281. DOI: <https://arxiv.org/abs/2311.06281>_.
- forward(x)
Forward pass of the S6 model.
- Parameters:
x (torch.Tensor) – The input tensor.
- Returns:
The output tensor.
- Return type:
torch.Tensor