S6

class ssm.model.S6(model_dim, hid_dim=16, n_layers=2, activation=<class 'torch.nn.modules.activation.GELU'>, real_random=False, normalization=True, **kwargs)

Bases: Module

Implementation of the Selective Structured State Space Sequence (S6) model.

The S6 model is designed to efficiently model long sequences using selective state space models. Its selection mechanism allows it to focus on relevant parts of the input sequence, making it suitable for tasks such as selective copy. It enables improved scalability and performance compared to classical recurrent architectures. The parallel scan algorithm is used to compute the output efficiently.

The model is composed of several S6 blocks, each followed by an activation function and a linear layer, similarly to the S4 model.