Files

Abstract

Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be in- appropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for fine- tuning LMs to explicitly generate intermedi- ate reasoning steps while interacting with a critic model that provides automated feedback on the reasoning. Specifically, the critic pro- vides structured feedback that the reasoning LM uses to iteratively improve its intermedi- ate arguments. Empirical evaluations of RE- FINER on three diverse reasoning tasks show significant improvements over baseline LMs of comparable scale. Furthermore, when us- ing GPT3.5 as the reasoner, the trained critic significantly improves reasoning without fine- tuning the reasoner. Finally, our critic model is trained without expensive human-in-the-loop data but can be substituted with humans at in- ference time.

Details

PDF