
In recent years, the size of large language models (LLMs) have increased exponentially, and the LLM pretraining is very time-consuming. Understanding the scaling behavior of LLMs is critical for training efficiency and responsible resource allocation. Existing scaling laws have demonstrated that proportionally increasing model capacity along with data size and compute budget decreases training loss and improves model performance. In this work, we revisit the existing empirical scaling laws for dense and sparse LLMs, and aim to generalize these scaling laws for different architectures using one single convenient representation for both dense and sparse LLMs.
Md Arafat Hossain is a PhD student in computer science at Iowa State University, advised by Dr. Ali Jannesari and conducting research in the SWAPP lab. He is currently serving as a research aide-technical at Argonne National Lab, hosted by Dr Xingfu Wu. His research focuses on automated performance tuning and algorithm configuration for optimizing computing workloads and the advancement of the understanding of the scaling behavior of foundation models.
See upcoming and previous presentations at CS Seminar Series.