A Comprehensive Medical Benchmark in Chinese (2023)

Welcome to CMB auto-evaluation platform! After submission, a breakdown of accuracy per category and subcategory will be shown right below. A JSON file is also available for downloading.

Submission Format

Here is a sample for submission.
You can validate the format by running the following Python code snippet:
import json samples = json.load(open(fp)) assert type(samples) == list assert len(samples) == 11200 assert all([type(s) == dict for s in samples]) # "id" and "model_answer" are required keys for each sample. Redundant keys have no effect on evaluation. assert all(['id' in s for s in samples]), "'id' must be a key of every sample" assert all(['model_answer' in s for s in samples]), "'model_answer' must be a key of every sample" assert sorted([s['id'] for s in samples]) == list(range(1, 11200+1)), 'ids must start from 1 and end at 11200' print('good to go!')

Submit

1.Choose a local .json file by clicking the "Choose File" button.

2.Click the "Upload & Process" button, It usually takes less than one minute to score the answers. Once processed, results will be displayed in the "Result" section below.

3.Send result score to cmedbenchmark@163.com for publicity