alci.dev/en
Resources
Dictionary
Miro Templates
Change language
Toggle theme
alci.dev/en
Change language
Toggle theme
X
Home
Resources
Dictionary
Miro Templates
Benchmarking
1 term
home
agile dictionary
categories
benchmarking
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
E
Evaluation Harness
An evaluation harness is a standardized AI testing framework for benchmarking LLM performance across tasks. Learn how tools like lm-eval-harness, HELM, and custom harnesses work.