We switched from the existing FP8 training method to BF16, and as a result, the reproducibility is significantly better compared to previous approaches.
We switched from the existing FP8 training method to BF16, and as a result, the reproducibility is significantly better compared to previous approaches.