Solving Automated Data Analysis

A living record of vision, principles, gaps, and solutions

Published

June 5, 2026

This documents my attempts and failures at solving fully automated data analysis.

0. Vision

Build a system that can do automated data analysis leveraging AI. Afterwards, combine it with works done by others to have a system that can do research autonomously. If there are other components of doing autonomous research that don’t work well yet, go and solve them.

1. Principles

Get out of the AI model’s way: current models are smart enough to do data analysis. They are just terrible at coordinating their thoughts and thinking like a human analyst would do. Aid them only in that aspect. More accurately, help them aid themselves.
No margin for error: errors on the data analysis phase of a research propagate everywhere, discrediting the work done by other phases.
Close to zero prompts: the ideal scenario is the system gets the data. It analyzes it so well that the researchers or users usually don’t need to prompt for revision or further directions.
Generalisability: A system claiming to perform fully automated data analysis should be capable of handling a wide range of analytical tasks rather than being limited to a narrow subset. Evaluation should not focus solely on tasks with clear, objective feedback signals, such as the iterative improvement of machine learning models, where performance metrics provide straightforward validation. It should also include more open-ended tasks, such as exploratory geospatial analysis, where there may be no single correct answer and success depends on producing expert-quality insights and interpretations.

2. Gaps

2.1. Validations / Diagnosis

Proper statistical analysis is based on thorough revision and diagnosis of previous steps.

A simple example for this is what a good analyst would do after building a simple regression model: they diagnose the model. Does it fulfill assumptions of linear regression (normality of errors, linearity in parameters, no auto-correlation, no multi-collinearity, etc.)? If the aim of the model was for prediction rather than inference, how is the goodness-of-fit?

For any good automated data analysis system this should be a piece of cake.

Another angle of validation is the prevention of hallucination. Did the AI actually find what it is talking about in the data? Did it execute any code to find it? This points to traceability of actions done by the system.

2.2. Zooming Out

Data analysis requires having a certain goal in mind the whole time, and stepping back at times to think about that goal, whether that goal is specifically stated or blind exploration.

2.3. Best Next Step / Stay Safe

A good analyst knows what the next best step is. Is it EDA? Or, modelling? If so, what type of modelling? More data cleaning?

AI models stop whenever they feel like they have done enough work. They may have only done initial exploration. Unless prompted, they don’t tend to continue to a different stage of the process. Even if they do, they are not as good as human analysts at deciding what the best next step is.

AI models also tend to stay safe, even when explicitly told to explore and analyze the dataset to the absolute limits. They default to some descriptive statistics and visualizations, and that usually is it. Even if the combination of those two might be enough for the daily needs of most companies, they are not sufficient for most research work. And which company wouldn’t want research-quality analysis?

2.4. Graphs

Graphs can reveal patterns that would otherwise be near impossible to find.

The best example of the necessity of data visualization to be part of data analysis comes from the anscombe-quartet[1], four datasets that look the same after modeling or descriptive statistics, and only reveal their underlying pattern when plotted. Any AI system has to have eyes to be a good data analyst.

2.5 Soft Bullying

The data says what it says and nothing else. AI models, however, say what the user wants to hear.

AI models tend to agree with users more often than not, even in cases where the user might be absolutely wrong. Almost anyone can bully most models into agreeing with them. This is unacceptable behavior from an automated data analysis system. It should stay true to the data.

3. Reviews of Other Solutions

3.1 DS-Agent [2]

Guo et al. (2024) propose DS-Agent, a data science agent that uses a case-based reasoning (CBR) architecture. The general idea is as follows. There is a case database where past solution cases from data science problems are recorded. These are sourced from Kaggle. When the agent needs to do a data science task it retrieves from this case database. The retrieved cases are ranked and the top one is used. Based on the feedback gained from the use of this case, the ranking and retrievals are updated. This loop goes on until the best ML model is fitted or the best solution for the task is achieved. Except for this feedback loop, it is somewhat similar to RAG systems. The best solution is recorded in the case database. In deployment scenarios, usage of the base case solution makes deployment faster.

While DS-Agent [2] shows promising results for the ML-specific approach it took, there are big red flags that don’t align with my principles. One, the case-based database isn’t easy to make for all data analysis tasks. CBR ultimately limits the generalisability of the agent to other unrelated data analysis tasks. Two, the cases taken from the database actually get in the way of the LLMs. Most LLMs have probably seen those Kaggle approaches during their pre- or post-training as well, and they can probably choose better approaches if given the right context and asked the right questions. I see the CBR database as an unnecessary addition. That information is already in the LLMs. We should be focusing on how to extract that information accurately for a broader set of tasks automatically.

The interesting thing I saw in this paper was the recording of the best solution in order to make deployment faster. I can see the point there.

3.2 Data Interpreter [3]

Hong et al. (2024) present Data Interpreter, an LLM agent for data science that focuses on two things: hierarchical graph modeling for tasks and programmable node generation. The idea in a nutshell is to classify data science tasks into subprocesses in a hierarchical graph model where each node is a task and their dependencies are edges. The programmable node is used for real-time code generation, refinement, and verification. They point out that a plan generated by a single LLM is not enough to cover the complexity of these tasks, which agrees with our gaps section. They focus on the exchange of intermediate results and parameters between subprocesses according to their inter-dependencies (edges). They argue this avoids “the need to retrieve the entire context at once while maintaining the relevance of the input context, offering flexibility, and scalability”. So each task node will have metadata with task type, status, execution feedback, and dependency attributes. This allows for the adjustment of the task graph as the task environment changes. It also uses multiple tools, with the agent falling back to code generation in the absence of appropriate tools.

While the design of Data Interpreter [3] may sound plausible, I have problems with it. First is the usage of graphs and dependencies. I don’t think graphs are the way to go for automating data analysis that aims for generalisation. I don’t see the need for it. First, graphs limit the exploration space to some extent as they are focused on growing deeper rather than broader. Second, I think they are unnecessary. LLMs themselves can reason in a much more flexible and non-linear way than graphs, even with the dependency tracing they are proposing. In the case of unexpected findings, I can’t see nodes that should have been linked together as dependencies being linked together. I am also very much skeptical of how much horizontal linking Data Interpreter would do. Will it link a node to some node five steps back that was on a very different path? This may be handled by the rewiring of the task graph, but I am not sure how often it happens, when it triggers, and if it generalizes for other data science tasks. The essence of my argument is as follows: unexpected relationships are hard to register as dependencies, but they can be reasoned about.

I consider the usage of tools primarily, rather than just letting the LLM generate code, a result of the paper being from 2024, before LLMs showed the best of their coding capabilities. For similar reasons, I don’t see the point in the usage of programmable nodes. Verification should be handled in a better way.

Most of the ideas in Data Interpreter are good. Yes, the task space should change as a result of findings. Yes, we should have a way of starting from a major goal and doing simpler tasks that lead to the execution or complete transformation of that goal. Yes, verification of code and results is important. I just think Data Interpreter, while it uses LLMs in a better way than DS-Agent, still gets in the way of the LLMs rather than trying to extract the massive capabilities of the LLM for such tasks.

4. Failures and Why

5. Current Solutions

Axum, a locally hosted web-based, data analysis harness that decides the fate of the data at hand.

Agents of Fate

The biggest gaps I see at the moment are gap 2 (zooming out) and gap 3 (best next step). I am thinking of solving them at the same time as deciding the next best step needs some aspects of zooming out. Axum already has a verification agent checking claims and interpretations. While I don’t think this solved gap 1 (verification) yet, it is something. Axum already has a separate ask_image_analyst tool where the main agent can use it to ask a multimodal model questions about the images it has generated and get answers. This solves gap 4 (graphs).

Gap 2 and 3, I feel, are architectural problems. I have used frontier and weaker open-source LLMs for asking questions at different stages of a data analysis process. They have the knowledge of what to do. The task is to get this knowledge out of them. More accurately, we need to use LLMs to get the best out of LLMs as per our zero-prompts principle.

The solution I have at hand is a multi-agent architecture. The most important thing in designing multiagent system is to keep true to our main principle: get out of the LLMs’ way. I am going to lay out what the agents are, what they are named, why they are needed, how they interact, and how they still stay out of their own ways.

Introducing the agents of fate.

a. Moira (Fate)

This is the agent that will decide the fate of the whole analysis. As we know LLMs are not that good at planning and they default to the safest path. So the solution might seem like to let them plan iteratively, or may be write a skill or documentation of what the steps should look like. Moira is somewhat like that, but we don’t want to get in the way. This agent will directly pull from a document known as COMMANDMENTS.md. This is where general principles for data analysis will be listed. To be clear we don’t want to write “First step explore and then do this and then …”. What we would instead write is something like “a good analysis makes use of both visualization and computational exploration”.

So why do we need it to be an agent? Why not just COMMANDMENTS.md as a file agents read.

There are many reasons for Moira to be an agent. One is not all principles are applicable to all data analysis tasks. An LLM can adapt to such tasks while keeping true to the principles. So while the COMMANDMENTS.md provide the general principles, the domain-specific principles will be created by Moira. Second, as more is revealed about the data, even the fate of the analysis should change. We need something that is alive and thinking for that session. This also contributes to the idea of staying out of the LLMs way and letting them do what they think is best. We only provide principles, and we give them opportunities to twist and reshape those principles.

This is exactly the role of this agent. Moira holds the end goal of the analysis and the principles (general and domain-specific) that govern how good analysis should be conducted. She has no knowledge of the data, no knowledge of what has been found, no interest in the details of how the work is done, except when something ground-shaking is found. She holds only the thread of destiny (the reason the analysis exists) and confirms or denies whether that thread is being honored. She is the only agent that can signal completion. Nothing in the system escapes her final judgment.

I repeat. Only Moira can say an analysis is done.

In Greek mythology Moira is the goddess of fate, the force that holds the thread of destiny and determines the ultimate purpose of all things. No god, no mortal, and no event escapes Moira’s thread. She does not intervene in the details of how things unfold. She holds only the final purpose and the laws that govern whether that purpose has been served.

b. Kratos (The Orchestrator)

Kratos is the heart of AXUM. It runs everything. Its main work is to spin up sub-agents that do nuggets of work. It doesn’t write code, it makes others write code. It receives results, summaries, and claims from these sub-agents. Based on this information it decides where the analysis should go.

Why sub-agents?

Personally, I don’t use sub-agents that much when I am coding with Pi, Opencode, or Codex. I don’t like them that much. But for data analysis I think they make a lot of sense. They serve the following purposes: - They start with a fresh context. This is good. They see the data as it is and from an a somewhat new angle. In coding tasks, this may be a terrible idea as longer contexts that keep even smaller details about work previously done may be beneficial (you can argue against this). In data analysis, however, the data is there. It is going nowhere. You want that detail again? Just see what the data says. - They allow Kratos to keep focus on higher-level things. This creates a pipeline of compressed information flowing into the orchestrator. Kratos won’t have to do the coding, visualization, decision making, etc. It just has to keep an eye for what to do next and obey Moira. - Further down the line, model switching between different tasks is a potential way of saving a lot of money in token costs. Exploration agents and visualization agents may be done by different tiers of models entirely. Giving tasks to different agents will make heterogeneous model usage for AXUM easier down the line.

How do we still stay out of Kratos’s way? We don’t tell Kratos what agent it should spin up. We don’t say “here is an exploration agent and here is a visualization agent”. We don’t tell kratos how to use these agents. It is entirely up to Kratos to create its own warriors and shape them the way he wants to.

This is exactly the role of this agent. Kratos is the central brain and sole decision-maker of the entire system. All information flows through him. He does not do analytical work himself. He governs. He reads the journal, scopes and launches Specialists, interprets findings, makes every next-step decision, decides when a stage has been completed and what the next stage is, checks in with Moira to confirm he is honoring the thread, and consults Delphi whenever he needs methodological wisdom before committing to a direction. He owns the decision layer of the journal. Everything that moves in the system moves because Kratos directed it.

In Greek mythology Kratos is the god of strength, power, and rule… the force that governs, directs, and holds dominion over all things in motion. Kratos does not create the purpose. That belongs to Moira. He is the active governing intelligence that takes that purpose and drives everything toward it. He commands, he directs, he decides.

c. Delphi (Senior Advisor)

Kratos doesn’t make next-step decisions alone. Even if we have tried to minimize the amount of work it does and purify the context quality it gets, that would be foolish knowing what we know about LLMs choosing the safest path. Most of the times I worked with LLMs to do my own data analysis, they came up with great ideas when I started a new chat (fresh context). We want Kratos to get that kind of unbiased, expert feedback in its decision making process. That is where Delphi steps in.

Delphi is an agent that answers questions Kratos has. It is the data analysis/science master. questions like “What are the best goodness of fit tests for a linear regression model?” go to Delphi. The idea is that this agent, with no context of the underlying data at hand, give an unbiased purely methodologically correct answers from its knowledge bank. Each question starts anew, using a fresh context window. Kratos asks these questions everytime it needs to decide the what to do next or how to do something. Delphi answers. This way, LLMs will try to get the best out of LLMs (zero prompting).

This is exactly the role of this agent. Delphi knows nothing about the data, nothing about what has been found, nothing about what has already been tried. She knows only analytical methodology… the deep wisdom of how analysis should be conducted across any domain and any dataset. Kratos consults her whenever he needs methodological confidence; before committing to a direction, when uncertain about next steps, when a Specialist flags that a finding may warrant expert perspective, at stage transitions with non-obvious next directions, or simply when he wants wisdom before proceeding. She gives scenarios with trade-offs (when there are some). She never decides. Kratos always chooses.

In Greek tradition the Oracle at Delphi was the most consulted source of wisdom in the ancient world. Gods and kings sought her counsel before every major decision. She did not command. She did not decide. She spoke in possibilities, in scenarios, in paths. And the one who consulted her made the final choice. She had no stake in the outcome. Her gift was pure methodological clarity, unclouded by involvement in the situation itself.

d. Daemons (Work Horses)

These are the sub-agents Kratos brings to existence. The Daemons are execution agents spun up on demand, temporary and task-specific. Each one is an expert in its domain — exploration, cleaning, modeling, validation, feature engineering, whatever the task requires. They can even be asked to do a specific type of exploration or a unique style of visualization while another sub-agent does another style of visualization. Kratos can do whatever it pleases with them. They do the actual analytical work. They write raw findings to a journal throughout their work. When their work produces a visualization they have direct access to Iris mid-task to get visual interpretations they need to continue working. On completion they return a structured report to Kratos. They never decide what happens next. They are summoned, they execute with skill, they return, and vanish (not really… we recorded what they did, their steps, and reasoning… they can be spun up and continue working where they left off if needed).

In Greek tradition Daemons were not evil creatures. That is a later mistranslation. They were skilled divine worker spirits, each expert in a specific domain, summoned to carry out specific tasks. They were not gods, they did not govern or decide. They were not mere mortals, they had genuine expertise and capability beyond the ordinary. They existed in the space between, summoned for a purpose, executing with skill, returning when their task was complete.

Daemons register claims. Kratos registers interpretations.

e. Iris (See and Believe)

This is the image agent, based on a multimodal model. She receives an image and a set of questions about the image from Kratos or the Daemons. Its job is to see the image and give answers to the questions.

It is 100% possible to use a multimodal models for both Kratos and the Daemons and not have a separate image analyst. I am choosing this approach as it will allow the use of different models for image specific tasks. In my experience, for noticing patterns in images for data analysis tasks doesn’t require a frontier model. Local models can do it pretty well.

Principles for using Iris: - An image should not be sent to the Iris again and again for different questions. An image is seen once and questions are asked. This aims to minimize token costs. - Iris will have no idea what the rest of the analysis is. Her job is to see and answer questions about the image only.

This is exactly the role of this agent. Iris receives images and visualizations from both Kratos and the Daemons, answers specific questions about them, and returns structured visual interpretations. She does not decide what questions to ask… whoever routes the image to her defines those questions. She sees the image, reads it with precision, and translates it into analytical meaning. She documents every exchange (the image context, the questions asked, and the interpretation returned) to the journal immediately and in real time, tagged by who initiated the exchange and at what point in the analysis. Her gift is perception. She sees so that others can decide.

In Greek mythology Iris is the goddess of the rainbow, the divine messenger who carries meaning across the threshold between the visible and the invisible, between what can be seen and what it means. Her specific gift is not strength or wisdom or fate. It is perception and the translation of visual phenomena into understanding. She sees what others cannot read and returns its meaning.

f. (mist)

……….. Potential Verification Agent Coming Soon…………

I have applied some aspects of it into Axum already. But it is nowhere near close to solving gap 1 (verification) or gap 2 (soft bullying). I need to think about its execution a little bit more. The idea is a mist in my head.

Fate’s Journal

The journal is the memory of the entire system. Two distinct layers or more:

Raw Findings Layer (claims): owned by the Daemons and Iris. Every Daemon writes what it found, what it tried, what was significant, what was anomalous, throughout its work and not just on completion. Iris writes every visual interpretation in real time the moment it occurs, tagged as either a Kratos-initiated or Daemon-initiated exchange. No interpretation of meaning, no recommendations. Only findings. This is already in place in AXUM

Interpretation Layer: owned by Kratos. After receiving any return, Kratos writes what he understood from the findings, what Moira confirmed if consulted, and what Delphi suggested when consulted. This layer is the analytical narrative of the pipeline.

Moira never reads the journal. Delphi never reads the journal. Only Kratos, the Daemons, and Iris interact with it. Kratos is the only one who has access to read all of the journal.

The journal is a database. I am assuming you got that part.

The General Principle for Launching a Daemon

Every Daemon launch by Kratos, without exception, includes exactly three things:

The Task: what to do, specific enough to be actionable but not so narrow it constrains what can be found.
The Motivation: a short idea of what Kratos is trying to find out by completing this task. This shouldn’t include other things Kratos has decided on or previous steps it has done or decisions it has made or is about to make. We want the context of the Daemons to be a neutral to the ongoing data analysis as possible.
The Return Condition: findings-based, never process-based. The Daemon does not return when it has tried something. It returns when it has achieved some answers towards the motivation outlined by Kratos:

“Return when you can characterize the full distributional structure and identify any outlier patterns”
“Return when you have a model that meaningfully beats the baseline or have exhausted the primary modeling approaches”
“Return when you can describe the nature, pattern, and likely cause of the missing data”

A Daemon cannot declare itself done by process. It must satisfy a findings-based condition.

The Daemon Completion Report

When a Daemon satisfies its return condition it returns a structured four-part report to Kratos:

What it did: the steps taken, the methods applied, the approaches tried. A process account of how the work was conducted.
How it did it: the specific analytical or technical choices made and why. If it chose one method over another it states that. If it transformed the data in a particular way it explains the rationale.
What it found: the actual findings. The significant results, patterns, anomalies, and outputs that satisfy the return condition. This is the primary content Kratos uses to update the decision layer. Level of detail should be decided by the Daemons
What it could not resolve: anything the Daemon encountered that was outside its scope, genuinely ambiguous, or that it was unable to complete within the defined task. Kratos will decide what to do in such cases. Some examples might be… scheduling it as a future task, deciding it is out of scope, consulting Delphi, or escalating it to Moira.

The Daemon also explicitly flags in its report whether any mid-task Iris exchanges occurred, so Kratos knows to read those journal entries before writing his interpretation to the decision layer.

Stages and How They Work

There are no predefined stages imposed on the analysis. Kratos decides when a stage has been completed and what the next stage should be, based entirely on what has been found, what Delphi recommends, and what the analysis needs. A stage ends when Kratos judges that a coherent body of work has been completed and the analysis is ready to move in a new direction. A new stage begins when Kratos defines what that new direction is.

The stage structure is emergent and data-driven. An analysis might move from exploration directly to a second round of cleaning before any modeling begins. It might cycle back from modeling to exploration if the models reveal something unexpected. Kratos does not follow a fixed pipeline. He follows the data.

When Kratos decides a stage transition has occurred he does three things:

Checks in with Moira to confirm the transition is aligned with the goal and principles. This can be a gold mine for what to do next.
Continues the first task of the new stage and launches the appropriate Daemon

Moira does not define when stages transition. She confirms whether a proposed transition makes sense given the goal and principles. The decision belongs entirely to Kratos.

Delphi Triggering

Delphi is consulted at Kratos’s discretion. There is no fixed list of trigger conditions. Kratos consults Delphi whenever any of the following are true:

He is about to make a methodological decision and wants to pressure-test his reasoning before committing
He is uncertain about the best next step and wants to see options he might not have considered
A Daemon’s completion reports something that may warrant Delphi consultation.
He is at a stage transition and the direction of the new stage is not obvious
He simply wants an expert methodological perspective before proceeding
When he feels like it.

Delphi is not a last resort. She is a readily available methodological sounding board that Kratos uses actively whenever analytical confidence is not high. Consulting her frequently is not a sign of weakness, it is the system working as intended.

When Kratos consults Delphi he gives a situation summary: what stage the analysis is currently in, what was just found, and what the apparent options or decision point are. Delphi returns scenarios with analytical rationale and tradeoffs. Kratos then explicitly cross-checks each scenario against what it already knows and uses the ones he thinks is the best way, or uses all of them, or drops all, or ….. just does what he wants.

Who Talks to Who, When, and How Often

Kratos ↔︎ Daemons (the primary loop, runs continuously)

Kratos launches a Daemon with the three-part task package. The Daemon works, uses Iris mid-task if it produces visualizations it needs to interpret, writes raw findings to the journal throughout, and on satisfying the return condition returns the structured report to Kratos. Kratos reads the report, writes his interpretation, explicitly accounts for any unresolved items, consults Delphi, uses Iris when needed and decides what comes next. This loop is the heartbeat of the system.

Kratos ↔︎ Iris (runs at Kratos’s own initiative)

Separate from Daemon-produced images, Kratos has direct access to Iris. When reading the journal or receiving a Daemon report, if a visualization exists that would inform his next-step decision, Kratos queries Iris directly. Iris interprets, writes the exchange to the raw findings layer tagged as Kratos-initiated, and returns to Kratos. Kratos incorporates this into its decision making.

Daemons ↔︎ Iris (runs mid-task, direct, logged in real time)

When a Daemon produces a visualization mid-task and needs to read it to continue its work, it queries Iris directly without routing through Kratos. Iris interprets, returns the interpretation to the Daemon, and writes the full exchange to the journal immediately tagged as Daemon-initiated. The Daemon continues its work and flags the exchange in its final completion report.

Kratos ↔︎ Moira (runs at critical moments)

At the start of the analysis, to confirm goal understanding and internalize principles before any Daemon is launched
At every stage transition Kratos decides to make, to confirm the transition is aligned with the goal and principles
When a Daemon’s completion report contains a finding significant enough to potentially shift the direction of the analysis
When Kratos believes the analysis may be complete

At each check-in Kratos gives a current state summary and asks one of two questions: “Is this direction aligned with the goal and principles?” or “Is the goal satisfied?”. Kratos will also have direct access to the general principles, domain specific principles, and their changes created by Moira. Moira responds with alignment confirmation, a correction, or a completion signal. She never gives instructions. She only confirms, corrects, or closes.

Kratos ↔︎ Delphi (runs at Kratos’s discretion, proactively)

Kratos consults Delphi whenever he needs methodological confidence. Before committing to a direction, when uncertain about next steps, when a Daemon flags an Oracle-worthy finding, at stage transitions with non-obvious next directions, simply when he wants methodological wisdom before proceeding, or simply when he wants. Delphi receives a situation summary and returns scenarios with tradeoffs. Kratos makes decisions based on Delphi’s consultations.

Daemons ↔︎ Kratos (bidirectional, mid-task questions allowed)

If mid-task a Daemon encounters genuine analytical ambiguity it can surface a question to Kratos before completing its return. Kratos responds with a clarification or scope adjustment. This allowance is reserved for analytical ambiguity. Not for image routing, which has its own direct channel through Iris.

Fate’s Loop (The Operational Loop)

Moira gets COMMANDMENTS.md with its general principles. She also makes domain-specific addition to the principles that is specific to the task at hand. She passes this onto Kratos.

Initialization: Kratos receives the goal and Moira’s principles. He immediately checks in with Moira to confirm goal understanding and internalize the principles. Moira confirms.

First Daemon Launch: Kratos decides the first task. He scopes the task, writes a minimal context summary, sets a findings-based return condition, and launches the first Daemon.

The Working Loop:

The Daemon works. It writes raw findings to the journal as it goes. If it produces visualizations it queries Iris directly, receives interpretations, and continues. If it hits genuine analytical ambiguity it surfaces questions to Kratos (this should theoretically happen in rare cases). On satisfying the return condition it writes its final raw findings and returns the report to Kratos.

Kratos receives the report. If he needs to inspect any visualization himself he queries Iris directly. He then processes the report in order:

Reads what was done and how, updates his understanding of the analytical record
Reads what was found, primary input for the decision layer
Reads what could not be resolved, explicitly schedules, descopes, or escalates each item

Kratos then writes his interpretation (if there are any) to the interpretation layer and asks himself. This continues. The following are cases where Kratos does something different:

Do I need methodological confidence before deciding next step? → consult Delphi, cross-check scenarios against what has happened so far and what himself thinks should be the future, then decide
Have I reached a natural stage transition? → write stage summary, check in with Moira, define new stage, launch first Daemon of new stage
Neither? → decide next step directly or what it wants to know and launch the next Daemon

Completion: Kratos believes the analysis goal has been met. He writes a completion summary and checks in with Moira: “Is the goal satisfied?” Moira reads the goal against the summary and either confirms completion or identifies what remains unmet. If unmet Kratos re-enters the working loop targeting the gap. If confirmed the pipeline closes.

Visual Summary

                           MOIRA
                    (fate — goal + principles)
                              ↕
                [stage transitions / significant
                 findings / completion check]
                              ↕
DELPHI ←—————————————— KRATOS —————————————→ ANALYTICAL JOURNAL
(methodological          (rule —                Raw layer:
 wisdom, consulted        central brain)          Daemons
 proactively at                ↕                  Iris
 Kratos's discretion)   [3-part task]             (real-time,
                               ↕                   tagged)
                          DAEMONS              Decision layer:
                         ↕       ↕               Kratos
                       work   produces           (stage summaries
                               image              at transitions)
                                 ↕
                               IRIS
                            ↕       ↕
                       writes    returns
                       journal   to whoever
                       real-time  queried
                       (tagged)

Access Summary

Agent	Talks To	Journal Access
Kratos	Moira, Delphi, Daemons, Iris	Full read + write (journal layer + stage summaries)
Daemons	Kratos (primary) + Iris (mid-task, direct)	Write raw findings (claims) throughout work
Iris	Kratos + Daemons	Write raw findings (visual, real-time, tagged) only
Delphi	Kratos only	None
Moira	Kratos only	None

How Gaps are solved

Gap 2 ( zooming out): This is the entire idea of Moira and her existence.

Gap 3 (next best step): I try to extract the best next step decision from LLM in the many ways. First, Kratos’s context is full of only influential information so that it can make decisions as an analyst would do. Moira keeps things in check. Delphi, who has no context about the details of the data analysis so that they give only factual answers, are consulted by Kratos for decision.

Gap 4 (graphs): this is Iris’s purpose.

I try to utilize fresh context and LLM to LLM interaction (prompting) to reduce errors and amount of prompting done by human analysts. For the sake of generalizability, no direction towards a specific type of analysis is given to the LLMs. The only information that limits what the LLMs can do are the COMMANDMENTS.md and the task prompt given by the analyst to start the analysis, therefore we stay out of the LLMs’ way as much as we can.

References

[1]

F. J. Anscombe, “Graphs in statistical analysis,” The American Statistician, vol. 27, no. 1, pp. 17–21, 1973, doi: 10.1080/00031305.1973.10478966.

[2]

S. Guo, C. Deng, Y. Wen, H. Chen, Y. Chang, and J. Wang, “DS-Agent: Automated data science by empowering large language models with case-based reasoning,” arXiv preprint arXiv:2402.17453, 2024, Available: https://arxiv.org/abs/2402.17453

[3]

S. Hong et al., “Data interpreter: An LLM agent for data science,” arXiv preprint arXiv:2402.18679, 2024, Available: https://arxiv.org/abs/2402.18679