DistilBERT Benchmark: Distributed Training trains Model over 13 Times Faster by using 8 Times the Resources
September 01, 2020 | Matthias Reso, Ricky Datta, Dan Waters, Patrick BangertThis article shows how the distillation process can be scaled up using a distributed training scheme and other training techniques.