“The preliminary results indicate a very promising research direction that may allow scaling SSMs to tens of billions of parameters.”
Team of IDEAS NCBR researchers has unveiled MoE-Mamba, a combination of Mixture of Experts and State Space Models. This is joint work of Maciej Pióro, Kamil Ciebiera, Krystian Król, Jan Ludziejewski and Sebastian Jaszczur, members of research teams of Piotr Sankowski and Piotr Miłoś.
“By interleaving Mamba with efficient MoE layers we get the best of both worlds – lots of parameters, fast training, and linear time inference,” says Sebastian Jaszczur. “MoE and Mamba seems like a match made in heaven.”
Check it out at arXiv: https://arxiv.org/abs/2401.04081
And blog: https://llm-random.github.io/posts/moe_mamba/