Flatten cache and add flashattention #2676

grimoire · 2024-10-29T04:27:55Z

No description provided.

tests/pytorch/kernel/test_flatten_kv_cache.py

RunningLeon

something wrong as shown in oc results on model internlm2_5-7b-chat

branch	dataset	version	metric	mode	internlm2.5-7b-chat-pytorch
This PR	race-high	bd3f33	accuracy	gen	14.38
main	race-high	bd3f33	accuracy	gen	87.36
This PR	gsm8k	a58960	accuracy	gen	84.31
main	gsm8k	a58960	accuracy	gen	86.43

RunningLeon

Looks normal after fix commit ef24e85

dataset	version	metric	mode	internlm2.5-7b-chat-pytorch
race-high	bd3f33	accuracy	gen	87.34
gsm8k	a58960	accuracy	gen	85.44

RunningLeon

LGTM

grimoire and others added 7 commits October 25, 2024 18:18

add flash attention

334c8f3

add flash attention

485f701

fix

b446d2a

remove paged attention prefill

67c6bfb

Merge branch 'main' into flatten_prefill_attention

cc2cf45

remove auto tuning

44ee29f

fix triton2

debfca0

lvhan028 requested review from AllentDan and RunningLeon October 31, 2024 06:37

lvhan028 added the improvement label Oct 31, 2024

AllentDan reviewed Nov 4, 2024

View reviewed changes

tests/pytorch/kernel/test_flatten_kv_cache.py Outdated Show resolved Hide resolved

grimoire added 2 commits November 4, 2024 16:30

Merge branch 'main' into flatten_prefill_attention

095a257

fix ut

74db002

grimoire marked this pull request as draft November 5, 2024 03:34

fix sliding window

34dd841

grimoire marked this pull request as ready for review November 5, 2024 05:36

RunningLeon reviewed Nov 6, 2024

View reviewed changes

grimoire added 2 commits November 6, 2024 19:01

Merge branch 'main' into flatten_prefill_attention

be975f5

fill last block

ef24e85

RunningLeon reviewed Nov 7, 2024

View reviewed changes

AllentDan approved these changes Nov 7, 2024

View reviewed changes

RunningLeon approved these changes Nov 8, 2024

View reviewed changes

lvhan028 merged commit 2bed018 into InternLM:main Nov 8, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flatten cache and add flashattention #2676

Flatten cache and add flashattention #2676

grimoire commented Oct 29, 2024

RunningLeon left a comment

RunningLeon left a comment •

edited

Loading

RunningLeon left a comment

Flatten cache and add flashattention #2676

Flatten cache and add flashattention #2676

Conversation

grimoire commented Oct 29, 2024

RunningLeon left a comment

Choose a reason for hiding this comment

RunningLeon left a comment • edited Loading

Choose a reason for hiding this comment

RunningLeon left a comment

Choose a reason for hiding this comment

RunningLeon left a comment •

edited

Loading