Discussion about this post

User's avatar
Daniel Popescu / ⧉ Pluralisk's avatar

That breakdown of data parallelism was super clear. Makes you think about the network overhead for all-reduce with massiv models, no?

No posts

Ready for more?