⚠️ Building flash-attn v2.7.4 with CUDA 12.8 on Windows cannot be completed because of GitHub Actions’ processing-time limits. In the future, I plan to add a self-hosted Windows runner to resolve this ...
Department of Chemistry, Molecular Sciences Research Hub, Imperial College London, 82, Wood Lane, London W12 0BZ, U.K. Centre for Rapid Online Analysis of Reactions, Molecular Sciences Research Hub, ...
I checked the code, and found out that it was raised by PyTorch. This issue from flash-attention Dao-AILab/flash-attention#782 suggests using a lower version of PyTorch to avoid this problem. Python ...