Building an MCP Server for Food Nutrient Information search

Automating Nutrition Data Annotation with LLMs, Docker, and the MCP Standard

Jun 24, 2025

Sometimes, you need tools, and sometimes, your tools need tools. In a recent personal project, while doing data annotation to build my eval dataset, I realized I needed to build a tool for my LLM Agent to make the annotation process less painful. I figured this would be an excellent opportunity to learn about MCPs and how to build one myself. After all, it has become the agreed-upon protocol for connecting resources to LLMs, and it looks to be a good fit for my needs.

The project at hand is a nutrition-related application (more on that in future posts), and the annotation process consists of adding nutritional information, like calories and macronutrients, to a list of ingredients that compose different dishes. This can be a slow process if you manually fetch that information by checking a food search website. I wasn’t comfortable asking an LLM to do it, either, due to the risk of hallucination, and getting ground truth is the stage where you want reliable data the most. So, the alternative is to combine both options - provide a tool to my LLM Agent so it can fetch nutritional information about foods itself.

This is a post about my process of building an MCP server to connect USDA’s Food Data Central API to my LLM Agent and what I learned along the way. If you’re interested in running the server yourself for your own purposes, check the project’s repository and follow the instructions there (you will need a free FDC API Key, Docker, and uv).

The Goal

As mentioned before, the process I wanted to simplify consists of providing a list of multiple dishes and foods, decomposed by their main ingredients, along with the quantity for each ingredient. So, let’s take a single dish as an example:

Olive oil - 10g
Garlic - 10g
Cooked Pasta - 193g
Pork chop - 107g

I want to enrich this data with a set of nutritional information, like this:

[
  {
    "food_description": "olive oil",
    "total_calories_kcal": 72,
    "serving_size_g": 8,
    "total_carbohydrates_g": 0,
    "proteins_g": 0,
    "total_fats_g": 8,
    "saturated_fats_g": 1.24,
    "sodium_g": 0.000016,
    "fiber_g": 0,
    "added_sugars_g": 0
  },
  {
    "food_description": "garlic raw",
    .... 
  }
]

Getting all that nutritional information manually for each ingredient would be very cumbersome, especially if we want to do that for hundreds or thousands of dishes with multiple ingredients each. So, the idea here is to leverage an LLM agent to fetch the nutritional info for me, and I need to provide the agent with the proper tool to do that. So, at the end, I’d like to ask my AI Agent (in my case, Cursor) something along those lines:

<ingredients>
   Olive oil - 10g
   Garlic - 10g
   Cooked Pasta - 193g
   Pork chop - 107g
</ingredients>

<output_reference>
[
  {
    "food_description": "white rice",
    "total_calories_kcal": 200,
    "serving_size_g": 150,
    "total_carbohydrates_g": 45,
    "proteins_g": 4,
    "total_fats_g": 0.5,
    "saturated_fats_g": 0.1,
    "sodium_g": 0.005,
    "fiber_g": 0.6,
    "added_sugars_g": 0
  },
  ...
]
</output_reference>

Can you search for nutrient information for each of the provided ingredients using the food-data-central tool? Please provide your output based on the given reference example. When you get the nutrient information per serving size, do the calculation to reach the nutrient information for the provided serving size.

What is MCP?

MCP is a protocol developed by Anthropic that provides a standard way for LLMs to connect with different tools and resources. In our case, we need the LLM to interact with Food Data Central’s API so it can get nutritional information about different foods and ingredients. To do so, the LLM needs context to know how to use the API, and MCP is the protocol we’ll use to provide that context.

Here’s an example of how we can signal that a function is an MCP tool, inform the function’s required parameters, and give an overall description of the tool:

@mcp.tool()
async def search_foods(
    query: str,
    data_type: Optional[List[str]] = None,
    page_size: int = 50,
) -> str:
    """Search for foods in the USDA Food Data Central database.

    This tool searches for foods using keywords and returns a list of matching food items
    with their basic information including FDC ID, description, and key nutrients.

    Args:
        query: Search terms to find foods (e.g., "cheddar cheese", "apple")
        data_type: Optional filter on data type (e.g., ["Branded", "Foundation", "Survey (FNDDS)", "SR Legacy"])
        page_size: Maximum number of results to return (default: 50, max: 200)
    """

The docstrings are more than documentation for yourself and other humans. It is the key context available to the LLM, which will enable the LLM to understand what the tool is for and how to use it.

This is a very short introduction focused on this specific use case. If you want a more in-depth explanation about MCP, I recommend checking the Official MCP Documentation.

How it Works

MCP follows a client-server architecture, where the host can connect to multiple servers. My IDE (Cursor) is the client for my particular use case, which will maintain a connection with the MCP Server.

Transports

Communication between the client and server can take different forms. Two standard mechanisms are currently used: stdio and streamable HTTP.

stdio—Communicates through standard input and output streams. In this mode, the client is responsible for starting the server process. As the official documents mention, it’s a simple communication process that can be useful for local integration.
streamable HTTP—This transport uses HTTP POST requests for client-to-server communication and optional Server-Sent Events (SSE) streams for server-to-client communication. It replaces the deprecated SSE transport, which is now incorporated into streamable HTTP.

The tools

We will build three different tools:

search_foods - Keyword search for foods in the USDA Food Data Central database.
get_food_details - Get detailed information about a specific food item by its FDC ID.
get_multiple_foods - Get detailed information about multiple food items using their FDC IDs.

Which will, in the end, make requests to the respective FDC’s API endpoints.

Building the Server

You can use LLMs to speed up the creation of your MCP Server. The official documentation even has a dedicated page instructing you how to do so. There is very useful information in there, including an LLM-friendly link to the complete MCP documentation that you can include in your prompt when creating your server.

For this project, though, I used two primary sources as references.

For the overall project layout and initial setup, I used Cole Medin’s excellent MCP-Mem0 project. I basically created a copy of the project and informed Cursor that the current project is meant to be a reference to be edited to build our server.

We also need to provide context about our specific use case. For that, I used Jamie Wong’s Food Data Central MCP Server project. This server is built with TypeScript and implements stdio as a mode of transport, which I couldn’t get to work for some reason. I’m using Cursor in WSL, so my best guess is that there was some confusion between OS’s during the server startup or connection. Of course, we won’t always have at hand another version of the server you want to build to use as a reference - in this case, we could have used FDC’s API docs directly as context.

We could also have created a project from scratch and passed MCP-Mem0’s main.py and FDC MCP Server’s index.ts as context, which would probably work better than my approach.

From there on, I built and ran the server with Docker and tested it by asking Cursor to use the tools (you can check the exact step-by-step on Github), which didn’t work at first. To debug it, I’d check the tool call’s response directly on Cursor’s chat and feedback the error to my chat on the MCP Server project. The issues were mostly about Pydantic validation errors.

The last detail I had to tweak was the transport used. My reference project implements both stdio and SSE, but SSE has now been deprecated and replaced with streamable-http. This change also requires changing our mcp.json configuration in Cursor, so now we have something like:

{
    "mcpServers": {
      "food-data-central": {
        "transport": "streamable-http",
        "url": "http://localhost:8050/mcp"
      }
    }
}

Great, now once you build the Docker Image with:

docker build -t food-data-central-mcp --build-arg PORT=8050 .

And run the server with:

docker run --env-file .env -p 8050:8050 food-data-central-mcp

You should be able to put the mcp.json shown above under .cursor/mcp.json, for example, and use the tool. Let’s give it a test:

**User**

Can you get me nutritional information for an apple using the food-data-central tool?

---


**Cursor**

Called search_foods
Parameters:
```
{
  "query": "apple",
  "page_size": 1
}
```

Results:
```
{
  "foods": [
    {
      "fdcId": 454004,
      "description": "APPLE",
      ...
```

Here is the nutritional information for an apple (serving size: 154g, branded as "TREECRISP 2 GO"):

- Calories: 52 kcal
- Protein: 0.0 g
- Total Fat: 0.65 g
...

Looks good!

I have found that LLMs can be a bit erratic with tool usage, needlessly calling the tool multiple times, and sometimes using the wrong choice of tool, but overall, with the proper prompt, I got it to work well for my data annotation needs.

Conclusion

I needed to create a tool for my LLM agent, and MCP seemed the perfect way to do that. This is a very anecdotal post, and it’s not meant to be a definitive guide on how to build MCP Servers. These are just some notes on what worked best for my particular case. But I do like using Docker for this purpose, and thought that using Docker with streamable-http as transport would get me closer to being production-ready than using stdio.

I hope it’s useful for anyone interested in building their own MCP servers. If you’re interested in the FDC MCP server, check the instructions on running it in the project’s repo, and let me know if you run into any issues!

Data Travelogues

Discussion about this post