The display results for various types and stages of exams as well as clinical case consultations are as follows:
Submissions of model test results to CMB are always welcome. For submission guidelines, please refer to the link.
Listed below are the optimal accuracy rates selected from four generation strategies: Zero-shot (with/without COT) and Few-shot (with/without COT).
For detailed information on generation and evaluation, please refer to the link.
Model | Institution | Avg. | Avg. | 医师考试 | 护理考试 | 药师考试 | 医技考试 | 专业知识考试 | 医学考研 | |||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
规培结业 | 护理考试 | 药师考试 | 医技考试 | 专业知识考试 | 医学考研 | 住院医师 | 执业助理医师 | 执业医师 | 中级职称 | 高级职称 | 护士执业资格 | 护师执业资格 | 主管护师 | 高级护师 | 执业西药师 | 执业中药师 | 初级药士 | 初级药师 | 初级中药士 | 初级中药师 | 主管药师 | 主管中药师 | 医技士 | 医技师 | 主管技师 | 基础医学 | 临床医学 | 预防医学与公共卫生学 | 中医学与中药学 | 护理学 | 考研政治 | 西医综合 | 中医综合 | |||
微医医疗大模型 | 微医 | 90.13 | 90.45 | 91.94 | 90.66 | 91.25 | 85.38 | 91.11 | 92.50 | 91.75 | 90.75 | 90.75 | 86.50 | 96.00 | 91.25 | 92.75 | 87.75 | 84.00 | 86.25 | 94.75 | 94.50 | 90.50 | 93.50 | 89.75 | 92.00 | 92.00 | 92.25 | 89.50 | 85.75 | 84.75 | 88.25 | 82.75 | 85.50 | 95.75 | 92.00 | 91.18 |
WiseDiag-v1 | 杭州智诊科技有限公司 | 87.48 | 87.05 | 92.38 | 89.84 | 88.42 | 83.75 | 83.44 | 87.25 | 91.50 | 91.75 | 84.75 | 80.00 | 95.50 | 96.25 | 93.25 | 84.50 | 83.25 | 87.50 | 96.50 | 95.00 | 84.50 | 90.25 | 89.00 | 92.75 | 89.75 | 87.75 | 87.75 | 82.25 | 83.00 | 87.00 | 82.75 | 83.25 | 89.50 | 82.50 | 78.50 |
墨融AI | 苏州墨融科技有限公司 | 87.21 | 86.05 | 92.12 | 86.46 | 87.33 | 83.56 | 87.75 | 87.00 | 89.50 | 89.75 | 83.50 | 80.50 | 96.00 | 95.50 | 92.25 | 84.75 | 80.75 | 81.00 | 92.25 | 90.50 | 82.25 | 86.75 | 88.50 | 89.75 | 89.75 | 87.25 | 85.00 | 84.25 | 83.25 | 85.75 | 81.00 | 83.00 | 96.25 | 89.00 | 82.75 |
RobotGPT-30B | 达闼机器人(成都) 解放军总医院医学创新研究部 |
86.80 | 86.45 | 91.69 | 86.09 | 89.08 | 82.69 | 84.81 | 86.50 | 92.00 | 91.50 | 83.50 | 78.75 | 96.00 | 94.75 | 94.25 | 81.75 | 78.75 | 81.75 | 92.75 | 90.25 | 81.50 | 87.50 | 86.75 | 89.50 | 92.25 | 87.00 | 88.00 | 84.00 | 79.50 | 86.75 | 80.50 | 81.50 | 96.75 | 86.75 | 74.25 |
素问 | 中国电子科技南湖研究院(南湖研究院) | 85.77 | 85.70 | 91.75 | 87.81 | 87.08 | 78.00 | 84.25 | 86.25 | 90.50 | 89.50 | 82.75 | 79.50 | 96.25 | 95.50 | 93.25 | 82.00 | 81.25 | 82.75 | 95.00 | 94.50 | 83.50 | 87.50 | 88.25 | 89.75 | 88.50 | 85.75 | 87.00 | 79.00 | 76.50 | 83.25 | 73.25 | 80.00 | 96.25 | 86.75 | 74.00 |
中国电子云医疗大模型 | 深圳陆兮科技有限公司 中国电子云-中国信创云 |
83.09 | 81.95 | 89.13 | 84.38 | 84.08 | 77.63 | 81.37 | 90.75 | 90.00 | 82.00 | 72.75 | 74.25 | 95.25 | 94.00 | 88.75 | 78.50 | 75.25 | 81.75 | 92.00 | 88.00 | 84.00 | 83.00 | 84.00 | 87.00 | 83.00 | 85.75 | 83.50 | 79.75 | 76.75 | 82.75 | 71.25 | 78.75 | 92.50 | 84.50 | 69.75 |
砭石 | 智慧眼科技股份有限公司 | 82.56 | 81.85 | 88.44 | 81.97 | 78.33 | 78.13 | 84.69 | 82.75 | 85.50 | 85.25 | 80.50 | 75.25 | 94.50 | 91.25 | 89.25 | 78.75 | 75.75 | 78.25 | 87.25 | 84.50 | 79.25 | 83.25 | 81.50 | 86.00 | 80.25 | 76.25 | 78.50 | 73.75 | 77.25 | 83.25 | 78.25 | 81.25 | 96.25 | 83.00 | 78.25 |
jianpeiGPT | 健培科技 | 81.78 | 81.70 | 86.38 | 81.72 | 81.42 | 77.44 | 82.00 | 80.50 | 87.75 | 88.50 | 78.75 | 73.00 | 92.00 | 92.00 | 87.50 | 74.00 | 74.00 | 79.25 | 89.00 | 87.75 | 76.50 | 82.25 | 83.00 | 82.00 | 81.50 | 80.00 | 82.75 | 78.50 | 75.50 | 83.50 | 72.25 | 77.00 | 94.00 | 80.50 | 76.50 |
CollectiveSFT | CAS·SIAT-NLP | 77.05 | 80.15 | 84.00 | 75.44 | 74.75 | 69.00 | 78.94 | 80.25 | 83.00 | 85.75 | 79.25 | 72.50 | 90.25 | 91.25 | 86.25 | 68.25 | 63.25 | 70.25 | 83.50 | 81.00 | 75.75 | 74.00 | 79.00 | 76.75 | 73.75 | 74.00 | 76.50 | 68.50 | 70.00 | 72.75 | 64.75 | 74.75 | 91.50 | 80.25 | 69.25 |
HuatuoGPTII-34B (华佗) | CUHKSZ-NLP | 76.80 | 75.65 | 82.31 | 76.81 | 76.17 | 74.38 | 75.56 | 73.75 | 77.75 | 82.50 | 75.50 | 68.75 | 87.75 | 87.50 | 81.50 | 72.50 | 68.25 | 73.50 | 85.25 | 82.25 | 73.25 | 77.00 | 74.75 | 80.25 | 77.25 | 74.00 | 77.25 | 72.00 | 74.25 | 78.75 | 72.50 | 73.25 | 79.75 | 77.00 | 72.25 |
Qwen-72B-Chat | Qwen | 74.38 | 78.55 | 83.56 | 79.78 | 77.92 | 68.25 | 58.19 | 78.00 | 86.00 | 88.00 | 75.00 | 65.75 | 94.25 | 92.50 | 87.75 | 59.75 | 69.50 | 73.75 | 88.25 | 87.75 | 77.00 | 79.50 | 78.25 | 84.25 | 77.50 | 79.00 | 77.25 | 66.25 | 65.00 | 77.50 | 64.25 | 70.25 | 50.00 | 65.50 | 47.00 |
BrainAuGPT | 脑动极光 | 71.12 | 70.30 | 81.31 | 70.31 | 66.50 | 68.50 | 69.81 | 70.75 | 74.25 | 81.00 | 64.75 | 60.75 | 87.50 | 87.25 | 82.50 | 68.00 | 61.50 | 68.00 | 79.00 | 77.00 | 65.00 | 68.75 | 71.75 | 71.50 | 66.75 | 64.50 | 68.25 | 69.50 | 67.00 | 76.75 | 60.75 | 69.00 | 91.00 | 65.50 | 53.75 |
Yi-34B-Chat | Yi | 69.17 | 71.10 | 77.56 | 73.16 | 73.67 | 66.56 | 52.94 | 68.00 | 78.75 | 78.75 | 69.50 | 60.50 | 87.00 | 84.00 | 82.25 | 57.00 | 59.00 | 66.00 | 84.00 | 80.75 | 69.75 | 71.50 | 75.00 | 79.25 | 73.25 | 74.25 | 73.50 | 66.00 | 64.75 | 73.00 | 62.50 | 63.75 | 45.25 | 56.50 | 46.25 |
HuatuoGPTII-13B (华佗) | CUHKSZ-NLP | 67.85 | 62.75 | 66.13 | 64.91 | 62.00 | 61.94 | 53.69 | 66.25 | 71.00 | 74.50 | 65.75 | 61.75 | 71.50 | 71.75 | 65.75 | 55.50 | 61.50 | 63.25 | 70.00 | 64.75 | 59.75 | 68.00 | 62.50 | 69.50 | 63.75 | 60.75 | 61.50 | 58.75 | 62.00 | 64.00 | 63.00 | 59.75 | 43.00 | 55.75 | 56.25 |
Yi-6B-Chat | Yi | 65.87 | 67.25 | 76.38 | 68.50 | 67.83 | 61.75 | 53.50 | 64.50 | 74.00 | 78.75 | 63.25 | 55.75 | 86.50 | 86.50 | 80.00 | 52.50 | 56.75 | 65.75 | 78.25 | 74.75 | 63.50 | 71.00 | 68.75 | 79.25 | 65.50 | 68.50 | 69.50 | 62.75 | 59.00 | 68.50 | 56.75 | 64.00 | 45.00 | 58.25 | 46.75 |
ShuKunGPT | 数坤科技 | 64.44 | 68.65 | 71.44 | 70.78 | 61.92 | 62.81 | 51.06 | 63.00 | 76.50 | 81.00 | 64.50 | 58.25 | 77.50 | 78.50 | 74.75 | 55.00 | 73.75 | 70.50 | 76.00 | 73.75 | 66.25 | 65.50 | 67.25 | 73.25 | 59.25 | 61.25 | 65.25 | 58.50 | 62.25 | 68.75 | 61.75 | 56.25 | 57.00 | 49.75 | 41.25 |
AntGLM-Med-10B | AntGroup | 64.09 | 66.85 | 71.75 | 69.44 | 55.92 | 61.19 | 59.38 | 63.25 | 71.00 | 81.75 | 62.00 | 56.25 | 82.50 | 74.75 | 70.50 | 59.25 | 70.50 | 63.50 | 76.50 | 75.75 | 63.75 | 71.50 | 65.50 | 68.50 | 53.00 | 53.00 | 61.75 | 58.50 | 61.50 | 63.50 | 61.25 | 60.25 | 57.75 | 64.50 | 55.00 |
GPT-4 | OpenAI | 59.46 | 59.90 | 69.31 | 52.19 | 61.50 | 59.69 | 54.19 | 59.75 | 58.50 | 64.50 | 60.75 | 56.00 | 77.50 | 72.50 | 68.75 | 58.50 | 54.75 | 47.00 | 60.00 | 63.25 | 39.50 | 47.25 | 59.50 | 46.25 | 58.50 | 60.75 | 65.25 | 63.00 | 65.50 | 68.25 | 42.00 | 57.00 | 61.00 | 61.25 | 37.50 |
HuatuoGPTII-7B (华佗) | CUHKSZ-NLP | 59.00 | 64.55 | 63.75 | 61.06 | 56.25 | 56.63 | 51.81 | 64.75 | 67.00 | 70.75 | 64.75 | 55.50 | 70.25 | 68.50 | 64.75 | 51.50 | 53.75 | 56.75 | 66.25 | 62.25 | 60.00 | 63.50 | 59.50 | 66.50 | 55.75 | 53.75 | 59.25 | 55.00 | 56.50 | 59.50 | 55.50 | 57.75 | 43.75 | 53.75 | 52.00 |
Qwen-14B-Chat | Qwen | 57.64 | 60.40 | 65.63 | 60.94 | 58.83 | 54.50 | 45.56 | 58.25 | 65.00 | 69.00 | 60.50 | 49.25 | 73.00 | 74.00 | 68.25 | 47.25 | 61.00 | 63.00 | 71.00 | 66.75 | 51.25 | 57.00 | 58.25 | 59.25 | 58.25 | 57.75 | 60.50 | 51.50 | 54.25 | 65.25 | 46.00 | 51.75 | 43.00 | 50.00 | 37.50 |
Deepseek-llm-67B-Chat | Deepseek-llm | 51.99 | 52.90 | 61.50 | 54.28 | 51.42 | 51.19 | 40.62 | 49.75 | 57.25 | 60.25 | 54.50 | 42.75 | 71.25 | 67.00 | 65.50 | 42.25 | 48.75 | 50.50 | 63.25 | 56.25 | 49.75 | 57.25 | 52.75 | 55.75 | 51.25 | 52.75 | 50.25 | 47.00 | 48.75 | 59.25 | 49.75 | 47.25 | 35.50 | 43.50 | 36.25 |
Baichuan2-13B-Chat | Baichuan-inc | 48.87 | 49.55 | 56.75 | 49.41 | 50.08 | 48.25 | 39.19 | 51.25 | 51.25 | 56.50 | 47.75 | 41.00 | 63.25 | 63.00 | 59.75 | 41.00 | 41.25 | 46.75 | 56.50 | 52.25 | 44.50 | 53.50 | 45.75 | 54.75 | 46.00 | 51.00 | 53.25 | 44.25 | 50.00 | 48.75 | 50.00 | 45.50 | 36.00 | 39.25 | 36.00 |
Qwen-7B-Chat | Qwen | 46.58 | 48.00 | 54.25 | 48.34 | 48.08 | 44.88 | 35.94 | 50.25 | 48.50 | 56.25 | 46.00 | 39.00 | 63.50 | 60.00 | 54.00 | 39.50 | 46.00 | 45.50 | 53.50 | 55.75 | 42.00 | 46.50 | 48.50 | 49.00 | 47.50 | 47.50 | 49.25 | 43.75 | 43.25 | 53.50 | 39.00 | 37.25 | 36.25 | 39.50 | 30.75 |
ChatGLM2-6B | THUDM | 45.05 | 44.40 | 53.88 | 43.41 | 41.58 | 42.13 | 44.88 | 43.50 | 43.75 | 48.25 | 47.25 | 39.25 | 54.25 | 63.50 | 51.50 | 46.25 | 36.50 | 38.25 | 49.50 | 48.50 | 43.75 | 40.75 | 43.75 | 46.25 | 43.50 | 38.50 | 42.75 | 39.25 | 43.00 | 50.25 | 36.00 | 43.00 | 55.75 | 42.25 | 38.50 |
Baichuan2-7B-Chat | Baichuan-inc | 43.33 | 42.55 | 51.75 | 44.59 | 45.50 | 43.00 | 32.56 | 40.25 | 44.50 | 50.75 | 44.25 | 33.00 | 59.50 | 54.50 | 53.75 | 39.25 | 39.75 | 40.75 | 49.75 | 46.25 | 45.50 | 46.75 | 41.00 | 47.00 | 43.50 | 48.75 | 44.25 | 42.00 | 40.00 | 49.00 | 41.00 | 37.00 | 32.75 | 31.50 | 29.00 |
Baichuan-13B-chat | Baichuan-inc | 41.40 | 39.00 | 48.31 | 42.94 | 38.50 | 41.06 | 38.56 | 38.50 | 39.00 | 43.75 | 38.75 | 35.00 | 53.75 | 54.75 | 46.25 | 38.50 | 37.50 | 37.75 | 53.00 | 43.25 | 41.25 | 44.25 | 40.75 | 45.75 | 37.75 | 38.00 | 39.75 | 39.25 | 37.75 | 47.00 | 40.25 | 39.75 | 40.75 | 35.00 | 38.75 |
Qwen-1.8B | Qwen | 40.00 | 44.15 | 50.63 | 39.78 | 39.25 | 36.56 | 33.75 | 43.25 | 45.50 | 49.00 | 45.50 | 37.50 | 62.75 | 52.25 | 50.25 | 37.25 | 34.50 | 33.50 | 48.25 | 46.75 | 35.25 | 41.75 | 36.00 | 42.25 | 38.50 | 37.25 | 42.00 | 33.75 | 36.50 | 45.75 | 30.25 | 38.50 | 34.50 | 36.50 | 25.50 |
ChatGLM3-6B | THUDM | 40.00 | 42.55 | 47.31 | 39.56 | 41.08 | 37.44 | 32.06 | 40.25 | 43.25 | 49.00 | 44.75 | 35.50 | 52.25 | 54.25 | 46.25 | 36.50 | 35.25 | 35.50 | 46.00 | 45.00 | 34.25 | 41.75 | 40.50 | 38.25 | 40.75 | 38.25 | 44.25 | 35.75 | 38.50 | 44.00 | 31.50 | 37.25 | 30.25 | 33.00 | 27.75 |
DISC-MedLLM-13B | FudanDISC | 39.76 | 42.25 | 46.88 | 38.44 | 38.83 | 40.75 | 31.44 | 42.25 | 42.00 | 47.25 | 44.25 | 35.50 | 49.75 | 54.25 | 49.75 | 33.75 | 31.25 | 35.50 | 44.00 | 44.25 | 34.75 | 40.00 | 36.50 | 41.25 | 37.25 | 39.75 | 39.50 | 38.50 | 40.25 | 47.25 | 37.00 | 37.25 | 26.75 | 34.25 | 27.50 |
IvyGPT | Macao Polytechnic University | 38.54 | 37.70 | 43.56 | 40.47 | 38.08 | 35.31 | 36.13 | 36.75 | 37.75 | 42.00 | 42.00 | 30.00 | 51.00 | 47.00 | 46.25 | 30.00 | 36.75 | 40.50 | 47.25 | 38.75 | 35.00 | 43.50 | 39.50 | 42.50 | 37.00 | 37.00 | 40.25 | 34.50 | 34.75 | 35.75 | 36.25 | 32.25 | 40.25 | 39.75 | 32.25 |
Sunsimiao | X-D Lab | 38.57 | 38.75 | 44.37 | 38.81 | 38.33 | 37.50 | 33.31 | 37.00 | 37.25 | 46.00 | 39.75 | 33.75 | 53.25 | 47.75 | 44.00 | 32.50 | 33.75 | 40.25 | 43.75 | 42.25 | 32.50 | 37.25 | 39.50 | 41.25 | 35.50 | 38.50 | 41.00 | 33.50 | 34.00 | 45.25 | 37.25 | 30.00 | 39.75 | 36.25 | 27.25 |
Internlm-Chat-20B | Internlm | 38.17 | 39.35 | 45.44 | 38.53 | 37.92 | 38.13 | 29.63 | 36.75 | 40.25 | 44.75 | 38.75 | 36.25 | 54.25 | 50.50 | 45.75 | 31.25 | 34.25 | 38.00 | 44.25 | 43.75 | 33.75 | 39.00 | 39.00 | 36.25 | 38.00 | 38.00 | 37.75 | 33.50 | 36.25 | 51.00 | 31.75 | 31.75 | 31.75 | 31.25 | 23.75 |
ChatGPT | OpenAI | 38.09 | 40.75 | 45.69 | 36.59 | 40.08 | 37.94 | 28.81 | 42.75 | 39.50 | 43.25 | 41.00 | 37.25 | 53.25 | 46.75 | 46.25 | 36.50 | 35.75 | 33.00 | 43.00 | 44.75 | 26.50 | 37.50 | 39.25 | 33.00 | 38.50 | 41.25 | 40.50 | 35.25 | 40.00 | 50.25 | 26.25 | 36.50 | 20.50 | 39.25 | 19.00 |
Mixtral-8x7B | Mistral | 36.37 | 39.00 | 41.87 | 33.12 | 39.50 | 36.44 | 28.25 | 40.00 | 35.50 | 43.50 | 41.00 | 35.00 | 46.50 | 42.00 | 45.25 | 33.75 | 30.50 | 32.75 | 37.50 | 41.25 | 28.25 | 32.50 | 32.75 | 29.50 | 36.00 | 39.25 | 43.25 | 35.25 | 36.75 | 47.50 | 26.25 | 31.25 | 24.25 | 35.00 | 22.50 |
Internlm-Chat-7B | Internlm | 34.91 | 34.45 | 42.13 | 33.69 | 37.50 | 33.75 | 27.94 | 34.50 | 33.50 | 40.25 | 34.50 | 29.50 | 47.00 | 46.25 | 40.00 | 35.25 | 32.75 | 30.25 | 38.50 | 38.75 | 26.50 | 33.75 | 34.75 | 34.25 | 34.00 | 39.00 | 39.50 | 28.00 | 35.25 | 46.25 | 25.50 | 33.75 | 29.50 | 27.00 | 22.50 |
HuatuoGPT (华佗) | CUHK(SZ)-NLP | 31.38 | 31.85 | 35.00 | 30.56 | 31.38 | 35.00 | 28.25 | 32.00 | 34.50 | 33.25 | 31.25 | 28.25 | 37.00 | 38.00 | 36.00 | 29.00 | 28.00 | 28.75 | 36.50 | 31.50 | 26.00 | 31.75 | 31.50 | 30.50 | 30.75 | 29.50 | 36.25 | 29.50 | 29.00 | 38.25 | 28.50 | 28.25 | 28.50 | 30.25 | 26.00 |
MedicalGPT | Xu Ming | 26.15 | 26.56 | 30.94 | 24.72 | 27.17 | 25.44 | 21.50 | 25.00 | 25.75 | 29.25 | 29.50 | 25.75 | 34.75 | 35.75 | 32.25 | 21.25 | 23.75 | 24.00 | 27.50 | 25.75 | 24.25 | 26.25 | 26.50 | 27.00 | 28.00 | 25.75 | 27.75 | 22.00 | 29.50 | 32.00 | 20.75 | 22.50 | 22.25 | 24.50 | 21.25 |
Mistral-7B | Mistral | 22.26 | 23.75 | 22.19 | 20.97 | 25.83 | 21.94 | 18.88 | 24.75 | 22.50 | 27.25 | 25.75 | 18.50 | 22.50 | 23.50 | 23.25 | 19.50 | 21.00 | 19.25 | 20.25 | 22.75 | 17.75 | 20.75 | 21.25 | 24.75 | 24.00 | 24.25 | 29.25 | 21.50 | 23.00 | 23.75 | 19.50 | 25.25 | 17.75 | 16.25 | 16.25 |
ChatMed-Consult | 华东师范大学 | 21.58 | 21.41 | 23.48 | 21.58 | 23.55 | 21.36 | 18.08 | 20.50 | 24.00 | 21.25 | 22.50 | 18.75 | 24.75 | 25.50 | 23.00 | 17.25 | 19.25 | 23.75 | 24.25 | 20.25 | 19.50 | 21.25 | 20.75 | 22.50 | 23.75 | 22.50 | 24.75 | 20.75 | 18.00 | 23.50 | 19.25 | 20.00 | 24.50 | 21.25 | 20.75 |
Bentsao (本草) | SCIR-HI | 20.39 | 21.67 | 19.99 | 21.07 | 22.85 | 19.83 | 16.93 | 20.39 | 23.00 | 24.25 | 20.50 | 18.25 | 24.00 | 23.25 | 22.25 | 15.25 | 18.25 | 20.75 | 25.00 | 22.00 | 18.25 | 22.00 | 19.50 | 24.75 | 23.50 | 22.75 | 26.00 | 18.25 | 20.00 | 21.00 | 19.25 | 20.50 | 16.75 | 21.50 | 19.75 |
ChatGLM-Med | SCIR-HI | 21.56 | 23.59 | 23.37 | 22.67 | 21.85 | 19.83 | 16.93 | 21.50 | 22.50 | 23.75 | 21.50 | 19.50 | 23.50 | 25.75 | 24.00 | 15.00 | 19.25 | 19.75 | 20.75 | 18.50 | 24.50 | 24.25 | 21.75 | 26.25 | 18.75 | 23.75 | 20.50 | 17.00 | 18.75 | 21.50 | 16.50 | 20.50 | 14.25 | 19.75 | 15.50 |
DoctorGLM | 上海科技大学 | 7.63 | 6.95 | 7.31 | 8.28 | 9.75 | 7.50 | 6.06 | 6.00 | 8.00 | 6.00 | 9.00 | 5.75 | 6.00 | 11.50 | 6.75 | 5.00 | 7.00 | 6.00 | 9.50 | 10.25 | 10.50 | 6.75 | 9.25 | 7.00 | 8.00 | 12.50 | 8.75 | 8.25 | 7.75 | 6.75 | 7.25 | 5.00 | 6.25 | 7.50 | 5.50 |
BianQue-2 (扁鹊) | 华东师范大学 | 7.38 | 6.95 | 7.31 | 7.25 | 9.75 | 6.94 | 6.06 | 7.50 | 9.00 | 8.00 | 8.75 | 6.00 | 6.75 | 7.00 | 7.50 | 5.25 | 7.75 | 6.75 | 6.75 | 6.00 | 8.25 | 8.00 | 7.25 | 9.00 | 6.00 | 9.00 | 10.00 | 6.50 | 5.75 | 6.25 | 9.75 | 5.25 | 4.00 | 6.25 | 9.00 |
The following represents the scoring results of the AI grading model based on the groundtruth, evaluated across four dimensions: fluency, relevance, completeness, and professionalism. Scores range from 1 to 5.
Model | Fluency | Relevance | Completeness | Proficiency | Avg. |
---|---|---|---|---|---|
GPT-4 | 4.95 | 4.71 | 4.35 | 4.66 | 4.67 |
Yi-34B-Chat | 4.99 | 4.69 | 4.34 | 4.64 | 4.67 |
HuatuoGPTII-34B | 4.96 | 4.61 | 4.31 | 4.53 | 4.60 |
Qwen-72B-Chat | 4.96 | 4.58 | 4.12 | 4.55 | 4.55 |
ChatGPT | 4.97 | 4.49 | 4.17 | 4.53 | 4.53 |
Baichuan2-13B-chat | 4.93 | 4.41 | 4.03 | 4.36 | 4.43 |
Baichuan-13B-chat | 4.96 | 4.19 | 3.97 | 4.23 | 4.34 |
ChatGLM3-6B | 4.92 | 4.11 | 3.74 | 4.23 | 4.25 |
Internlm-Chat-20B | 4.90 | 3.91 | 3.25 | 4.14 | 4.05 |
ChatGLM2-6B | 4.86 | 3.76 | 3.51 | 4.00 | 4.03 |
HuatuoGPT | 4.89 | 3.75 | 3.38 | 3.86 | 3.97 |
Deepseek-llm-67B-Chat | 4.78 | 4.04 | 2.62 | 4.16 | 3.90 |
BianQue-2 | 4.86 | 3.52 | 3.02 | 3.60 | 3.75 |
DISC-MedLLM | 4.86 | 3.08 | 2.67 | 3.30 | 3.48 |
ChatMed-Consult | 4.88 | 3.08 | 2.67 | 3.30 | 3.48 |
MedicalGPT | 4.48 | 2.64 | 2.19 | 2.89 | 3.05 |
DoctorGLM | 4.74 | 2.00 | 1.65 | 2.30 | 2.67 |
Bentsao | 3.88 | 2.05 | 1.71 | 2.58 | 2.55 |
ChatGLM-Med | 3.55 | 1.97 | 1.61 | 2.37 | 2.38 |
Mixtral-8x7B | 2.53 | 2.28 | 1.54 | 3.04 | 2.35 |