There's a particular shape of bug in modern AI products that's tolerable in some domains and disqualifying in others.

In marketing copy: the AI confidently invents a customer testimonial. Fine — you edit it out, you ship the page, no real harm done.

In finance: the AI confidently invents an outstanding amount, or a GST figure, or a P&L line. Now you've made a decision — paid a vendor, raised an invoice, signed a return — on a number that doesn't exist in your books. That's not a UX problem. That's a liability.

The polite word for this failure mode is "hallucination." The honest one is that the model is making things up.

Why generic AI tools hallucinate over Tally data#

Generic AI tools — the kind you might paste a screenshot of your books into — hallucinate in two predictable ways when applied to finance:

They invent precision. Asked "what was our net profit last quarter?", they'll cheerfully return a number to two decimals, even if the data you gave them doesn't support it.
They smooth over missing context. Asked about "all our suppliers in Tamil Nadu," they'll return an answer without telling you which suppliers were excluded because the ledger metadata didn't make state membership unambiguous.

The first failure makes you wrong with confidence. The second makes you wrong without realizing it.

What "grounded" actually means#

A grounded AI answer in a finance context has three properties:

Provenance. The system shows you which ledgers, voucher numbers, and posting dates contributed to the answer. You can spot-check any number by drilling into its source.
Honest scope. When the system can't answer cleanly — because of an ambiguous ledger, a missing GSTIN, a stale voucher — it tells you the scope it actually answered, not the scope you asked about.
Calibrated confidence. Some answers are arithmetic (sum a column — confidence is 100%). Others involve inference (group these vendors by industry — confidence depends on master data quality). A good system distinguishes between them.

Build those three properties in, and you have a tool a finance team can actually use. Skip them, and you have a tool that's useful only for tasks where being wrong is cheap.

A short test you can run on any AI tool you're evaluating#

We use this as a quick filter:

Ask the tool a specific factual question about your books that you can verify in 30 seconds (e.g. "what's the outstanding for ledger X as of March 15?"). Then ask it the same question with a typo in the ledger name.

A grounded system will:

Answer the first question with the correct number and show you which voucher it came from.
Either resolve the typo confidently (and tell you it did) or ask which of the matching ledgers you meant.

A non-grounded system will:

Answer the first question with a plausible-looking number, no source attribution.
Confidently answer the typo'd version too, often with a different number.

If your tool does the second thing, the answer is wrong with high confidence and you have no way to know. That's worse than no tool at all.

Where we draw the line at Koshio#

A short version of the rules we hold ourselves to:

No invented data. Every answer cites its source vouchers, or it's not shown.
No smoothed-over gaps. If we can only answer part of a question, we say so and answer that part precisely.
Confidence is visible. Each answer carries a confidence marker, and we err toward "uncertain" when the master data isn't clean.

These aren't research goals. They're the minimum bar for a tool that's allowed to touch finance data.

The honest summary#

You can build AI products that look impressive on a demo and fail on a Tuesday morning when someone actually relies on them. Or you can build slower-feeling products that hold their shape under audit. In finance, only the second category is worth shipping.

If you want a longer read on why the bar is higher in finance, the RBI's guidance on AI in financial services is the right starting point. The operational version of that guidance, for the person who actually closes the books, is the section above.

There's a particular shape of bug in modern AI products that's tolerable in some domains and disqualifying in others.

In marketing copy: the AI confidently invents a customer testimonial. Fine — you edit it out, you ship the page, no real harm done.

The polite word for this failure mode is "hallucination." The honest one is that the model is making things up.

Why generic AI tools hallucinate over Tally data#

Generic AI tools — the kind you might paste a screenshot of your books into — hallucinate in two predictable ways when applied to finance:

They invent precision. Asked "what was our net profit last quarter?", they'll cheerfully return a number to two decimals, even if the data you gave them doesn't support it.
They smooth over missing context. Asked about "all our suppliers in Tamil Nadu," they'll return an answer without telling you which suppliers were excluded because the ledger metadata didn't make state membership unambiguous.

The first failure makes you wrong with confidence. The second makes you wrong without realizing it.

What "grounded" actually means#

A grounded AI answer in a finance context has three properties:

Provenance. The system shows you which ledgers, voucher numbers, and posting dates contributed to the answer. You can spot-check any number by drilling into its source.
Honest scope. When the system can't answer cleanly — because of an ambiguous ledger, a missing GSTIN, a stale voucher — it tells you the scope it actually answered, not the scope you asked about.
Calibrated confidence. Some answers are arithmetic (sum a column — confidence is 100%). Others involve inference (group these vendors by industry — confidence depends on master data quality). A good system distinguishes between them.

Build those three properties in, and you have a tool a finance team can actually use. Skip them, and you have a tool that's useful only for tasks where being wrong is cheap.

A short test you can run on any AI tool you're evaluating#

We use this as a quick filter:

Ask the tool a specific factual question about your books that you can verify in 30 seconds (e.g. "what's the outstanding for ledger X as of March 15?"). Then ask it the same question with a typo in the ledger name.

A grounded system will:

Answer the first question with the correct number and show you which voucher it came from.
Either resolve the typo confidently (and tell you it did) or ask which of the matching ledgers you meant.

A non-grounded system will:

Answer the first question with a plausible-looking number, no source attribution.
Confidently answer the typo'd version too, often with a different number.

If your tool does the second thing, the answer is wrong with high confidence and you have no way to know. That's worse than no tool at all.

Where we draw the line at Koshio#

A short version of the rules we hold ourselves to:

No invented data. Every answer cites its source vouchers, or it's not shown.
No smoothed-over gaps. If we can only answer part of a question, we say so and answer that part precisely.
Confidence is visible. Each answer carries a confidence marker, and we err toward "uncertain" when the master data isn't clean.

These aren't research goals. They're the minimum bar for a tool that's allowed to touch finance data.

Grounded AI for finance — why hallucinations are unacceptable in MSME books

Why generic AI tools hallucinate over Tally data#

What "grounded" actually means#

A short test you can run on any AI tool you're evaluating#

Where we draw the line at Koshio#

The honest summary#

Keep reading

How to ask AI questions about Tally without exporting to Excel

Spot GST input tax credit mismatches in Tally with AI

Why India's AI moment will be built on top of Tally

Grounded AI for finance — why hallucinations are unacceptable in MSME books

Why generic AI tools hallucinate over Tally data#

What "grounded" actually means#

A short test you can run on any AI tool you're evaluating#

Where we draw the line at Koshio#

The honest summary#

Keep reading

How to ask AI questions about Tally without exporting to Excel

Spot GST input tax credit mismatches in Tally with AI

Why India's AI moment will be built on top of Tally