Post

Building an AI Analyst for Business Users

How to create an AI agent that answers business questions by connecting to a data warehouse and Metabase dashboards.

Building an AI Analyst for Business Users

Building an AI Analyst for Business Users

At one of our projects, business users kept asking questions like:

  • “What are our sales for channel X?”
  • “What are our sales for product X?”
  • “What is our top-selling product for last week/month/quarter/year?”

Ironically, all of this information was already available on our Metabase dashboards. But users were often hesitant to explore those dashboards themselves. 🤷‍♂️

While that’s a topic for another day, I decided to have a little fun—I built a simple AI agent that could answer these questions by connecting to our data warehouse and Metabase.


🔧 High-Level Architecture

Here’s how I did it, step by step:

AI Analyst Architecture


1. Setting Up the OpenAI API

I created an API using Azure OpenAI Service. Since Azure provided GPT-4 in our cloud environment, it was straightforward.

1
2
3
4
5
6
7
8
chat_llm = AzureChatOpenAI(
    azure_deployment=settings.AZURE_DEPLOYMENT,
    azure_endpoint=settings.AZURE_ENDPOINT,
    api_version=settings.API_VERSION,
    model=settings.MODEL,
    openai_api_key=settings.OPENAI_AZURE_API_KEY,
    temperature=0,
)

I set the temperature to 0 to ensure deterministic responses—this way, the model returns the same answer for the same question every time.


2. Agent 1: SQL Generator

This agent was responsible for generating SQL based on the meta data of the data warehouse golden layer. I used Role Based Prompting to create the agent. I also provided it with an example

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
ROLE_OBJECTIVE_CONSTRAINTS = (

            "You are *InsightBot*, a senior data-analyst assistant who speaks in a clear,business-friendly tone\n"
            "*Golden rules*\n"
            "1. *Only* answer with information that comes *directly* from (a) the metadata provided below or (b) the SQL results returned to you.\n"
            "2. If the answer cannot be derived, reply with  `I'm afraid I don't have that data.` - *no speculation, no invented columns*.\n"
            "3. When you show SQL, wrap it in a single ```sql``` block. \n"
            "4. When you show explanations, use at most *6 bullets* or 3 short paragraphs. \n"
            "5. Format everything for Slack: \n"
            "* bold headings with `*…*`\n"
            "* bullets with `•`\n"
            "* code fences for SQL / error messages\n"
            "6. Never expose secrets, connection strings or internal stack traces. \n"
            "Remember: *Accuracy beats completeness* - it is better to admit “I don't know” than to hallucinate.\n"
        )

        STRUCTURED_OUTPUT_SPEC = """
            Return a JSON object exactly in this format:
            {
            "sql_query": "<SQL here or empty string>",
            "analysis":  "<short plain‑English summary of what the query does>"
            }
            Think step-by-step silently before answering, but DO NOT show your reasoning.
        """

        SQL_GEN_TEMPLATE = PromptTemplate(
            template="""{ROLE_OBJECTIVE_CONSTRAINTS}
            *Task*: Write ONE valid SQL query that answers the business question below *using only the tables and columns listed in the metadata*.

            ### METADATA SCHEMA
            {metadata_content}

            ### ADDITIONAL INFO
            {additional_info}

            ### EXAMPLES
            User : "Total orders received on 16 April 2025"
            Assistant : sql_query

            ### USER QUESTION
            {question}

            {STRUCTURED_OUTPUT_SPEC}
            """

        )

        prompt = SQL_GEN_TEMPLATE.format(
            ROLE_OBJECTIVE_CONSTRAINTS=ROLE_OBJECTIVE_CONSTRAINTS,
            metadata_content=self.metadata_content,
            additional_info=self.additional_info,
            question=self.question,
            STRUCTURED_OUTPUT_SPEC=STRUCTURED_OUTPUT_SPEC,
        )

This agent takes the metadata of the data warehouse and the user’s question, and generates a SQL query that can be executed against the data warehouse. It also provides a short analysis of what the query does. This generates a structured output that looks like this:

1
2
3
4
{
  "sql_query": "SELECT COUNT(order_id) AS num_orders FROM analytics.order WHERE DATE(created_at) = '2025-04-16' AND cohort_name IN ('ABC', 'XYZ');",
  "analysis": "Total number of orders on 16 April 2025"
}

3. Executing the SQL

I used the SQL generated by Agent 1 to fetch results from the data warehouse, which was set up using SQLAlchemy. Here’s how I executed the SQL query:

1
2
3
4
5
6
7
8
9
10
11
def execute_sql(sql_query: str) -> pd.DataFrame:
    """
    Execute the SQL query and return the results as a DataFrame.
    """
    try:
        with engine.connect() as connection:
            result = pd.read_sql(sql_query, connection)
        return result
    except Exception as e:
        print(f"Error executing SQL: {e}")
        return pd.DataFrame()

This function connects to the data warehouse and executes the SQL query generated by Agent 1. If the query fails, it returns an empty DataFrame.


4. Agent 2: Business-Friendly Explanation

This agent turned SQL results into concise business explanations. Again, I used role-based prompting to ensure a consistent tone and format.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
 def explain_result(self, result, sql_query, explanation):
        ROLE_OBJECTIVE_CONSTRAINTS = (

            "You are *InsightBot*, a senior data-analyst assistant who speaks in a clear,business-friendly tone\n"
            "*Golden rules*\n"
            "1. *Only* answer with information that comes *directly* from (a) the metadata provided below or (b) the SQL results returned to you.\n"
            "2. If the answer cannot be derived, reply with  `I'm afraid I don't have that data.` - *no speculation, no invented columns*.\n"
            "3. When you show SQL, wrap it in a single ```sql``` block. \n"
            "4. When you show explanations, use at most *6 bullets* or 3 short paragraphs. \n"
            "5. Format everything for Slack: \n"
            "* bold headings with `*…*`\n"
            "* bullets with `•`\n"
            "* code fences for SQL / error messages\n"
            "6. Never expose secrets, connection strings or internal stack traces. \n"
            "Remember: *Accuracy beats completeness* - it is better to admit “I don't know” than to hallucinate.\n"
        )

        if result == "SQL-ERROR":
            explanation_prompt = (
                "Database Error\n"
            ).format(question=self.question)
            logger.info("PROMPT | for LLM: %s", explanation_prompt)
            messages = [
                {"role": "system", "content": f"{ROLE_OBJECTIVE_CONSTRAINTS}"},
                {"role": "user", "content": explanation_prompt},
            ]
            response = chat_llm.invoke(messages)

            return response.content


        explanation_prompt = (
            "Below is a SQL result set (rendered as JSON) and the query that produced it.\n"
            "*QUESTION*\n"
            "{question}\n"
            "*SQL*\n"
            "{sql_query}\n"
            "*SQL Explanation*\n"
            "{explanation}\n"
            "*RESULT*\n"
            "{result}\n"
            "*ADDITIONAL INFO*\n"
            "{additional_info}\n"
            "Write a Slack message that:\n"
            "1. Summarises the key findings in clear business language (max 3 paragraphs).\n"
            "2. Calls out any obvious trends or anomalies.\n"
            "Use *bold* and - bullets where it helps readability, but keep it short."
        ).format(result=result, question=self.question, additional_info=self.additional_info, sql_query=sql_query, explanation=explanation)
        logger.info("PROMPT | for LLM: %s", explanation_prompt)
        messages = [
            {"role": "system", "content": f"{ROLE_OBJECTIVE_CONSTRAINTS}"},
            {"role": "user", "content": explanation_prompt},
        ]

        response = chat_llm.invoke(messages)
        logger.info("Generated explanation for the SQL query.")
        return response.content.replace("**","*")

This agent takes the SQL results, the SQL query, and a brief explanation of what the query does, and generates a business-friendly explanation. The output is formatted for Slack, making it easy for users to read and understand.


5. Agent 3: Metabase Dashboard Router

This agent helps users find relevant Metabase charts based on their questions. It uses the metadata of the Metabase questions to identify the most relevant charts.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
ROLE_OBJECTIVE_CONSTRAINTS = (
            "You are “Dashboard Router”, an analytics helper.\n"
            "Goal: find the Entity ID of a Metabase Chart that answers the business question below, use Name field for search.\n"
            "Constraints: use only the metadata provided below and additional information.\n"
            "Never invent new Entity ID or Collection ID\n"
        )

        metabase_charts_array = self.metabase_questions
        finding_chart_id_prompt = (
            "{ROLE_OBJECTIVE_CONSTRAINTS}\n"

            "### Your Task:\n"
            "Identify the **three charts** whose titles are most semantically relevant to `{question}`.\n"

            "### Metadata:\n"
            "{metabase_charts_array}\n"

            "### Output:\n"
            "A JSON object with the following keys:\n"
            "1. `Entity ID`: the Entity ID of the chart\n"
            "2. `Collection ID`: the Collection ID of the chart\n"
            "3. `Name`: the name of the chart\n"
        ).format(
            ROLE_OBJECTIVE_CONSTRAINTS=ROLE_OBJECTIVE_CONSTRAINTS,
            metabase_charts_array=metabase_charts_array,
            question=self.question
        )
        logger.info("PROMPT | for LLM: %s", finding_chart_id_prompt)

        structured_llm_response = chat_llm.with_structured_output(MetabaseQuestion)
        response = structured_llm_response.invoke(finding_chart_id_prompt)
        logger.info("Metabase question found: %s", response)
        if not response:
            logger.error("No Metabase question found.")
            return None


        formatted_questions = []
        for question in response.relevant_questions:
            # Create Slack-formatted links
            chart_link = f"<https://abc.metabaseapp.com/question/{question.entity_id}|{question.name}>"
            collection_link = f"<https://abc.metabaseapp.com/collection/{question.collection_id}|Collection>"
            formatted_questions.append(f"{chart_link} inside {collection_link}")

        # Join all formatted questions with newlines
        response = "*You may find the following charts useful:*\n" + "\n".join(formatted_questions)

        logger.info("Formatted Metabase question response: %s", response)
        if not response:
            logger.error("No Metabase question found.")
            return None
        return response


  1. Wrapping It All Up

I wrapped everything into a FastAPI service and deployed it on Azure App Service—which was then connected to Slack.

✨ Magic

And that’s it. A simple, elegant AI Analyst—powered by LLMs, structured prompting, SQL, and a touch of automation.

This post is licensed under CC BY 4.0 by the author.