Flamingo Posted on 2025-02-12 Edited on 2025-11-26 In notes , LLM Disqus: use tanh to ensure that the initial output of cross attention is 0, improved the training stability of training process