Skip to content Search
“Our model, MoE-Mamba, outperforms both Mamba and Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer,” write IDEAS NCBR researchers.

“The preliminary results indicate a very promising research direction that may allow scaling SSMs to tens of billions of parameters.”

Team of IDEAS NCBR researchers has unveiled MoE-Mamba, a combination of Mixture of Experts and State Space Models. This is joint work of Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski and Sebastian Jaszczur, members of research teams of Piotr Sankowski and Piotr Miłoś.

“By interleaving Mamba with efficient MoE layers we get the best of both worlds – lots of parameters, fast training, and linear time inference,” says Sebastian Jaszczur. “MoE and Mamba seems like a match made in heaven.”

Check it out at arXiv:

And blog:

Featured news

IDEAS NCBR joins Adra
Krzysztof Walas joins ELLIS Society
IDEAS NCBR papers accepted at ICLR 2024