Building a fast, self‑hosted research agent with OpenAI models + SerpAPI
Modern language models are effective at synthesis but do not inherently provide fresh, verifiable information. Connecting a model to the web search creates an autonomous AI agent that closes the gap: it enables current sources, systematic coverage of a topic, and traceable answers. This post describes a compact research agent that plans its searches, executes them concurrently via SerpAPI, and produces a cited synthesis—designed to be readable, auditable, and simple to run locally.
Full code available here: https://github.com/serpapi/web-research-agent
Use cases for AI research agents
The AI agent targets questions where recency and attribution matter. It streamline research and can automate many repeatable workflows and complex tasks: market scans, literature overviews, competitive comparisons, and quick technical surveys. Instead of incremental browsing, the model first enumerates all searches it needs, the system executes those queries in parallel, and the model then writes a grounded answer using the returned snippets. Since the agent has access to web on the fly, the results are close to real-time.
Among different use cases, the web research agents in particular allow for scalable data collection of online datasets. The agent’s outputs could be streamed into natural language processing pipelines for further generation of insights. Those insights could be stored in a knowledge base or further refined via more complex systems, such as multi-agent systems where several agents collect data and other agents conduct it’s review.
Organizations could use web research agents for internal uses too, such as using them to conduct research on customer data to enhance customer experience and improve customer support.
How it works (overview)
- Planning: Large language model emits a batch of structured tool calls, each containing a Google query.
- Execution: The agent runs those queries concurrently through SerpAPI and returns concise title:snippet pairs as tool messages.
- Synthesis: With the results in context, the model produces the final answer, including inline citations derived from the snippets.
This pattern keeps latency low (parallel requests), improves coverage (the plan precedes the data), and preserves a step-by-step trace for auditing.
Minimal setup
- Python 3.9+ (3.10+ recommended)
- Environment variables: OPENAI_API_KEY and SERPAPI_API_KEY
You can obtain OpenAI API key at https://platform.openai.com/. For SerpAPI key, you can register at https://serpapi.com/. There is a free plan so you can test the agent first. Then install the repository run the agent:
git clone https://github.com/vladm-serpapi/web-research-agent
cd web-research-agent
pip install -r requirements.txt
# set keys in shell (recommended)
export OPENAI_API_KEY="sk-..."
export SERPAPI_API_KEY="..."
python research_agent.py -q "What are the latest approaches to retrieval‑augmented generation in 2025?"
Implementation
The agent is implemented as a single class with a run method. The constructor initializes model configuration, API clients, the tool schema, and the system prompt. Model configuration accepts different AI models, but primarily the ones supported by OpenAI. The code could be extended to optimize this and allow for provider-agnostic inference (e.g. using OpenRouter). Once the agent is built, the run method executes the inference loop until a final answer is produced.
# research_agent.py
class ResearchAgent:
"""LLM‑powered researcher that combines OpenAI o‑series model with SerpAPI."""
def __init__(
self,
model: str = "o3",
topn: int = 10,
debug: bool = False,
openai_key: t.Optional[str] = None,
serpapi_key: t.Optional[str] = None,
) -> None:
self.model = model
self.topn = topn
self.debug = debug
self.openai_key = openai_key or os.getenv("OPENAI_API_KEY")
self.serp_key = serpapi_key or os.getenv("SERPAPI_API_KEY")
if not self.openai_key or not self.serp_key:
raise RuntimeError("OPENAI_API_KEY and SERPAPI_API_KEY must be set.")
self.client = OpenAI(api_key=self.openai_key)
# tools + prompt initialized below ...
In order to actually connect to the Web, we need to provide the model with a tool to do so. self.tools
field is initialized with a tool schema that the model will generate when it needs to get the web data. Tool schema definition includes the function name, tool description and the parameter that we want the model to provide. In this case we ask the model to just provide query: string parameter for Google Search. Code:
self.tools = [
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search Google and return the top result snippets.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Google search string"}
},
"required": ["query"],
},
},
}
]
The system prompt is defined in a way that encourages the model to research the user's question using the search_web
tool. The prompt requires the model to generate all necessary tool calls in a batch and return them together. This tool batching is necessary to improve the total processing latency. If the model needs to make say 10 requests, then each request will take 1 second, then it will take a total of 10 seconds. If we use batching and run all the calls concurrently, then we will reduce the total run time to just 1 second (ideal case).
self.sys_prompt = (
"You are a meticulous research assistant.\n"
"When outside knowledge is needed, you must emit ALL `search_web` tool calls "
"in a SINGLE assistant message before reading any results.\n\n"
"You must return them in the exact JSON structure the API expects for `tool_calls`,\n"
"with each having its own `id`, `type`, and `function` fields.\n"
"Do not write explanations, just the tool calls.\n\n"
"Always batch between 2 and 50 calls in a single turn if you need external data.\n"
"Only after all tool outputs are returned should you write your final, well-cited answer."
)
In practice, this nudges the model to enumerate queries that span the topic (overview, specifics, recent updates) before seeing any retrieved snippets.
The retrieval backend (SerpAPI integration)
Each tool call is executed against SerpAPI’s Google endpoint. The agent returns compact title:snippet pairs, which are sufficient for grounding and efficient on tokens. We use SerpAPI Python SDK in order to make requests and obtain the result snippets.
def _search_web(self, query: str) -> str:
if self.debug:
print(f"[DEBUG] → SerpAPI query: '{query}'")
search = GoogleSearch({"q": query, "api_key": self.serp_key, "num": self.topn})
org = search.get_dict().get("organic_results", [])[: self.topn]
return "".join(
f"- {r.get('title','(untitled)')}: {r.get('snippet','(no snippet)')}"
for r in org
) or "No results found."
Returning concise strings instead of full pages keeps interactions predictable and reduces overhead. It is possible to extend this implementation in the future, to include a scraper tool that will fetch the complete pages. That could allow the model to generate better insights by getting access to more context.
Agentic inference loop
At the top level, run(question: str) builds the message context (system + user), calls the chat completions API, and branches depending on whether the model returned tool calls or a final answer.
def run(self, question: str) -> dict[str, t.Any]:
messages = [
{"role": "system", "content": self.sys_prompt},
{"role": "user", "content": question},
]
steps: list[dict[str, t.Any]] = []
while True:
if self.debug:
print("[DEBUG] → OpenAI chat.completions.create request …")
resp = self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=self.tools,
tool_choice="auto",
)
msg = resp.choices[0].message
if msg.tool_calls:
# append assistant message FIRST (per API contract)
messages.append(msg)
# fetch all tool results concurrently
def fetch(call):
args = json.loads(call.function.arguments)
q = args["query"]
steps.append({"type": "tool_call", "query": q})
return call.id, q, self._search_web(q)
with ThreadPoolExecutor() as pool:
results = list(pool.map(fetch, msg.tool_calls))
# append tool results in the same order as tool_calls
for call_id, q, result in results:
steps.append({"type": "tool_result", "content": result})
messages.append({
"role": "tool",
"tool_call_id": call_id,
"content": result,
})
continue # next iteration → model now has snippets; produce final answer
# no tool calls → final answer
answer = msg.content.strip()
steps.append({"type": "assistant_answer", "content": answer})
return {"question": question, "answer": answer, "steps": steps}
Key points in the loop:
- The agent always appends the assistant’s tool_calls message before returning tool outputs (API contract).
- Tool execution is concurrent via ThreadPoolExecutor to reduce latency.
- Tool outputs are appended in order and associated with tool_call_id; the next model call then has everything needed to synthesize the answer.
Putting it all together
Finally, a small wrapper exposes the agent for use in the terminal. It parses flags, instantiates ResearchAgent, runs it, prints the final answer, and optionally writes a JSON trace (steps + answer) for auditing. The common argparse
library is used to do that. The wrapper exposes then necessary options for the client, such as -q
- provide a question, -m
- provide the model name, --outfile
- specifies file for results, --debug
- enables debug logging to trace intermediate execution steps.
The result is a compact, auditable pipeline: the model is doing the planning and generates search requests, search requests are retrieved in parallel via SerpAPI, the model generates an answer with citations - all within a few clear components.
Usage
The code could be used either via the terminal or imported and used directly as a Python module. Examples:
# basic
python research_agent.py -q "State of LLM reasoning benchmarks in 2025"
# save a JSON trace (tool calls, results, final answer)
python research_agent.py -q "Compare FAISS vs. Milvus vs. Qdrant for RAG (2025)" --outfile trace.json
# control model and results per search
python research_agent.py -q "Airline industry trends in 2025" -m gpt-4o -n 8
Using in Python directly:
from research_agent import ResearchAgent
agent = ResearchAgent(model="gpt-4o", topn=10, debug=False)
result = agent.run("Summarize the most cited papers on RAG.")
print(result["answer"]) # final, cited summary
print(len(result["steps"])) # trace length
Notes on the usage:
- Keys: ensure both OPENAI_API_KEY and SERPAPI_API_KEY are available in the environment before running the scripts.
- Model behavior: o3 / o4‑mini may prefer fewer tool calls per turn; gpt‑4o often batches more queries when broad coverage is required.
- Model’s output: as usual with all AI models hallucination is possible. Human oversight is required to ensure the output quality.
Conclusion and what's next
In this blog post, we showed how to design an Agent capable of running multiple complex research workflows, such as: news scans, literature reviews, vendor/technology comparisons, and quick technical surveys where recency and traceability matter. It runs locally via a CLI or as a library, with optional JSON traces to audit tool calls and outputs.
For future blog posts, we plan to expand the Agent's functionality to include multiple tools (maps, flights, hotels, domain APIs, etc.) to support more complex, goal-directed workflows. We will experiment with different AI tools and generative AI frameworks to build an Agent capable of a more complex decision making.