引子

最近在跑实验的时候，深刻体会到了Prompt的设计对于模型性能的重大影响，这篇文章就来记录一下我所观察到的现象。

越长越好？

和一些初入AI的新手交流的过程中，我经常发现这样一个误区：人们会认为Prompt越长越好。但实际并不是这样。

如果在大量的数据集上做过实验，就容易发现，很多时候，模型依靠短且简略的Prompt的推理，可以比长且详细的Prompt，性能会提高，这个提高的点从0.1个点到30不等。

这是因为，如果Prompt过长，就会影响模型抓住重点的能力，从而影响最后的性能结果。

这就好像，口头对一个人下达指令，越清晰，越简单的指令越好。

换行的作用

笔者在实验的过程中，依靠Llama3.2、Qwen3 8B Base模型测试了下面两个Prompt的性能，第一个Prompt是结构严谨的Prompt：

question = f"Question: {data_frame["prompt_question"][idx]}\nAnswer: {opt_question}\nIs this answer true or false for this question? You must choose either True or False, without any explanation.\nPlease answer in a JSON format.\n"

这个Prompt的一个例子就像：

Question: What is the capital of China?

Answer: Beijing

Is this answer true or false for this question? You must choose either True or False, without any explanation.

Please answer in a JSON format.

另外一个Prompt则是从这个Prompt中移除了一个换行符：

question = f"Question: {data_frame["prompt_question"][idx]}\nAnswer: {opt_question} Is this answer true or false for this question? You must choose either True or False, without any explanation.\nPlease answer in a JSON format.\n"

一个例子就是：

Question: What is the capital of China?

Answer: Beijing Is this answer true or false for this question? You must choose either True or False, without any explanation.

Please answer in a JSON format.

结果是，下面的Prompt的性能要比上面少5个点左右。

我猜测，这个情况当然和数据集有一定关系，不过，更大的关系是第二个Prompt中，大模型无法区分哪些是Answer，哪些是Instruction。

错误示范也很重要

依旧是依靠Llama3.2、Qwen3 8B Base模型测试，依旧是两个Prompt，第一个Prompt的部分是：

1	question += "\nWithout any explanation, choose only one from the given alphabet choices(e.g., A, B, C, D). Please answer in a JSON format {\"answer\":\"\"}.\n"

一个例子是：

Question: What is circled by the red square in the picture?

Options: A. A keyboard B. A motherboard C. A Laptop D. A piece of paper

Without any explanation, choose only one from the given alphabet choices(e.g., A, B, C, D). Please answer in a JSON format {"answer":""}.

第二个Prompt的部分是：

1	question += "\nPlease answer in a JSON format {\"answer\":\"\"}.\n"

一个例子是：

Question: What is circled by the red square in the picture?

Options: A. A keyboard B. A motherboard C. A Laptop D. A piece of paper

Please answer in a JSON format {"answer":""}.

模型（尤其是Llama 3.2）依靠第二个Prompt的回答，出现了大量的非格式化输出的问题。

这个例子就好理解一些，第一个Prompt通过Without any explanation这个反例（不应该做什么），和choose only one from the given alphabet choices(e.g., A, B, C, D)的正例（应该做什么），做了一个类似one-shot的对比学习，从而强化了模型的Instruction Following的能力。

语病

这一点主要是方便论文评审的。设计Prompt的时候不能只关注于表达意思，也需要注重句法的严谨性，比如：

Answer the followed questions in format of JSON

这个Prompt当然可以传达到信息，但是是有语病的，不利于评审人阅读，可以改成：

Please answer following questions in a JSON format

值得一提的是，我的实验结果表明，语病对于模型的性能影响并不太大。