Abstract: Large language models (LLMs) have demonstrated impressive capabilities in code generation, achieving high scores on benchmarks such as HumanEval and MBPP. However, these benchmarks primarily ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results