🔍 FLUKE Model Comparison Viewer
Interactive analysis of model performance across text modifications
Model:
All Models
Task:
All Tasks
NER (Named Entity Recognition)
Dialogue Classification
Sentiment Analysis
Coreference Resolution
GSM (Grade School Math)
Instruction Following (IFEval)
Performance:
All Performance Types
Original Better
Modified Better
Both Correct
Both Wrong
Modification:
All Modifications
Active to Passive
Capitalization
Casual Language
Compound Words
Concept Replacement
Coordinating Conjunction
Derivation
Dialectal
Discourse
Geographical Bias
Grammatical Role
Length Bias
Negation
Negation Change
Punctuation
Sentiment
Temporal Bias
Typo Bias
Available Samples
0
Filtered Results
0
Original Better
0
Modified Better
0
Both Correct
0
Both Wrong
0
Loading data...
Loading comparison data...