Scaling Multi-Objective Optimization: Meta & FAIR’s CGPO Advances General-purpose LLMs | Synced
In a new paper The Perfect Blend: Redefining RLHF with Mixture of Judges, a research team from Meta GenAI and FAIR developed Constrained Generative Policy Optimization (CGPO), which offers a more s...
Source: Synced | AI Technology & Industry Review
In a new paper The Perfect Blend: Redefining RLHF with Mixture of Judges, a research team from Meta GenAI and FAIR developed Constrained Generative Policy Optimization (CGPO), which offers a more structured approach to RLHF, advancing the performance of general-purpose LLMs.