Skip to content Search
“Our model, MoE-Mamba, outperforms both Mamba and Transformer-MoE. In particular, MoE-Mamba reaches the same performance as Mamba in 2.2x less training steps while preserving the inference performance gains of Mamba against the Transformer,” write IDEAS NCBR researchers.

“The preliminary results indicate a very promising research direction that may allow scaling SSMs to tens of billions of parameters.”

Team of IDEAS NCBR researchers has unveiled MoE-Mamba, a combination of Mixture of Experts and State Space Models. This is joint work of Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski and Sebastian Jaszczur, members of research teams of Piotr Sankowski and Piotr Miłoś.

“By interleaving Mamba with efficient MoE layers we get the best of both worlds – lots of parameters, fast training, and linear time inference,” says Sebastian Jaszczur. “MoE and Mamba seems like a match made in heaven.”

Check it out at arXiv:

And blog:

Featured news

Breakthroughs require freedom and patience
Krzysztof Walas joins Adra Board of Directors
I was told that doing computer science is not for women. Alicja Ziarko talks women in AI research