AI Disclosure in Systematic Reviews and Meta-Analyses

AI in a systematic review changes more than the wording

A systematic review lives or dies on its audit trail.

Readers need to know how you searched, screened, extracted, judged, and synthesized evidence. If AI touched any of those steps, you need to say so. Otherwise, nobody can tell where the tool shaped the review, where errors may have entered, or how your team caught them.

That does not mean you should avoid AI.

It means you should treat AI like any other method choice that can affect reproducibility. PRISMA 2020 already asks authors to report review methods with enough detail for readers to assess what happened. AI use belongs inside that same logic. (prisma-statement.org)

If you want one clean record of that use, generate an AI Usage Card for the review and adapt it for your manuscript, supplement, protocol, or repository.

Why evidence synthesis needs stricter disclosure

Systematic reviews already run on traceability.

You predefine the question. You specify databases. You set inclusion criteria. You document screening decisions. You explain extraction rules. You justify synthesis methods. That culture makes AI disclosure a natural extension, not an extra burden.

AI can influence a review long before the writing stage.

A tool might suggest PICO terms, expand search strings, rank abstracts, recommend exclusions, extract effect sizes, summarize outcomes, draft risk of bias notes, or write chunks of the manuscript. Each step can change the final review. A small change in screening or extraction can move the pooled estimate or alter the narrative around benefit and harm.

That is why a vague line like "we used AI for writing assistance" often fails in this context.

Readers need to know where AI entered the workflow, what tool or model you used, what kind of input you gave it, what output it produced, and how humans checked that output. Cochrane's current guidance takes the same line. It allows AI use, but it expects disclosure, human oversight, and author accountability for the final content. (cochrane.org)

If you want the broader case for this, read Why AI Transparency Matters in Research and AI Ethics and Documentation in Academic Research.

The real question is where AI entered the pipeline

Many researchers still think disclosure starts and ends with manuscript text.

That is too narrow for evidence synthesis.

In a systematic review or meta-analysis, AI can shape decisions at five points.

Planning the review

Some teams use chatbots to refine the review question, define PICO elements, suggest outcomes, or brainstorm search terms and databases.

That use deserves disclosure because it can shape scope before the first search ever runs.

Searching the literature

AI may expand keywords, translate terms, rewrite Boolean strings, or help identify related records.

Search transparency sits at the center of review quality. If AI changed the search strategy, say so.

Screening and eligibility

This is one of the most sensitive stages.

AI tools may prioritize records, label likely exclusions, or recommend inclusion decisions. If humans did not check every decision, readers need to know that. If humans did check every decision, say that too.

Extraction, appraisal, and synthesis

Tools can pull sample sizes, outcomes, intervention details, confidence intervals, and study characteristics from PDFs. They can also draft evidence tables or summary paragraphs.

This is where speed tempts people into trust. Don't do that. Extraction errors, hallucinated values, and lost nuance can slip in fast.

Writing and editing

AI may draft methods text, shorten abstracts, rewrite prose, or clean up language.

This still needs disclosure, but it should not overshadow earlier uses that carry more methodological weight.

For related field-specific advice, see AI Disclosure for Qualitative Research and AI Disclosure for Social Science Research.

What readers need you to disclose

Your disclosure should let another researcher understand the role of AI without guessing.

In practice, six facts do most of the work.

First, name the tool and version if known. "ChatGPT" is thin. "GPT-4.1 accessed through the ChatGPT web interface in January 2026" gives readers something they can actually interpret.

Second, state the task. Did the tool help with search term generation, title and abstract screening, data extraction, coding study characteristics, narrative synthesis, or writing support?

Third, describe the inputs in plain language. You do not need to dump every prompt into the main paper, but you should summarize what you gave the model.

Fourth, explain human oversight. Did two reviewers check all AI-assisted screening suggestions? Did one reviewer verify every extracted number against the source PDF? This point matters more than vendor claims.

Fifth, state the limits you imposed. For example, did you bar AI from final inclusion decisions or final effect estimates? Did you avoid uploading copyrighted or sensitive full texts to a public system?

Sixth, say where the full record lives. That may be the Methods section, a supplement, a repository, a protocol appendix, or an AI Usage Card.

If you are still unsure whether your use crossed the line into reportable assistance, read Do I Need to Disclose AI Usage in My Paper? and How to Disclose ChatGPT Usage in Academic Papers.

PRISMA does not replace AI disclosure, and AI disclosure does not replace PRISMA

PRISMA 2020 remains the baseline reporting framework for systematic reviews. It asks authors to report the search strategy, selection process, data collection process, and synthesis methods with enough detail for readers to assess the review. (prisma-statement.org)

AI disclosure fits inside those same items.

If AI helped build search strings, report it with the search methods.

If AI ranked records for screening, report it with the selection process.

If AI extracted data, report it with the data collection process.

If AI drafted text, report it in the methods, acknowledgments, or a dedicated disclosure note, depending on the journal's policy.

That placement makes the paper easier to audit because readers can see the tool beside the method it affected.

New guidance now asks for more detail on prompts, versions, and validation

This area moved fast in 2025 and early 2026.

In December 2025, Research Synthesis Methods published editorial guidance for manuscripts that test generative AI in systematic review and meta-analysis workflows. The guidance asks authors to report research design, prompts, model behavior across conditions, validation methods, and task-specific performance metrics. It also notes that generative AI outputs can vary with prompt wording, model version, and random variation. (cambridge.org)

That editorial targets evaluation studies, not every ordinary review manuscript. Still, the logic carries over. If your review used AI for screening, extraction, or synthesis support, you should keep records that let others understand what happened.

Cochrane's current public guidance points in the same direction. Cochrane says authors may use AI if they disclose it, keep human oversight, justify the use, and retain responsibility for the final review. Cochrane also notes that there is still no settled consensus on what level of AI error is acceptable in evidence synthesis. (cochrane.org)

So the safe rule is simple: if AI touched judgment-heavy review steps, document more than you think you need.

A disclosure structure that works in real papers

You do not need a long statement.

You need a statement that answers five questions:

What tool did you use?
What task did it perform?
What input did you provide?
Who verified the output?
What decisions did humans keep for themselves?

Here is a short example:

We used GPT-4.1 through the ChatGPT web interface to suggest search term variants for predefined PICO concepts and to draft initial summaries of included studies. Two authors reviewed all suggested search terms before database submission. One author checked all AI-generated study summaries against the source articles and corrected errors before synthesis. AI outputs did not determine inclusion, exclusion, risk-of-bias judgments, or final effect estimates.

That tells readers what they need to know.

For more wording patterns, see AI Usage Cards Examples and Templates and AI Transparency Requirements for Journal Submissions.

Where teams get this wrong

The first mistake is vague disclosure.

If you write "AI was used during manuscript preparation" but the tool also touched screening or extraction, you hid the part that matters most.

The second mistake is trusting suggestions because they look tidy.

AI tools can miss negation, flatten complex interventions, confuse outcome measures, and invent numbers that look plausible on first read. In a meta-analysis, one bad extraction can flow straight into the effect estimate.

The third mistake is keeping no record.

If AI helped with screening, extraction, or synthesis, save prompts, outputs, access dates, model names, and notes on reviewer corrections. That record helps during peer review, protocol updates, and team handoffs.

The fourth mistake is blurring the boundary between suggestion and decision.

Readers should know what the model proposed and what the review team decided.

The fifth mistake is ignoring confidentiality and licensing.

Many evidence synthesis teams work with subscription PDFs, unpublished data, or sensitive material. Before you paste text into a public AI system, check copyright terms, institutional rules, and any data protection limits.

A workflow you can actually follow

Start after you lock the protocol.

That gives you a reference point. You can compare AI-assisted actions against methods you planned in advance.

Then log each AI use as it happens.

A plain spreadsheet works. Record the date, tool, version, task, input type, output type, reviewer who checked it, and final action taken. If the tool changed across time, log that too.

Next, verify every output against source material.

Do not let AI make final inclusion decisions on its own. Do not copy extracted values into analysis files without human checking. Do not treat polished prose as proof of accuracy.

Then convert your notes into a disclosure package.

That package may include a short methods paragraph, a supplement table, and an AI Usage Card. The card gives you a reusable summary that you can attach to a manuscript, thesis chapter, grant appendix, or repository readme.

If you work in a team, ask one person to own the log. Shared responsibility sounds good until nobody remembers who saved the prompt history.

Example wording for methods, supplement, and appendix

You can adapt the language below to your own review.

Methods paragraph in LaTeX

\paragraph{Use of AI tools.}
The review team used GPT-4.1 via the ChatGPT web interface for two limited tasks:
(1) generating candidate search term variants based on predefined PICO concepts and
(2) drafting preliminary narrative summaries of included studies.
All search strings were reviewed and finalized by the authors before submission to databases.
AI outputs were not used to make final screening, eligibility, risk-of-bias, or meta-analytic decisions.
One reviewer checked each AI-assisted summary against the original article, and a second reviewer spot-checked the final evidence table before synthesis.

Supplement table in LaTeX

\begin{table}[ht]
\centering
\begin{tabular}{p{2.5cm}p{3cm}p{3cm}p{4cm}}
\hline
Stage & Tool & Human check & Final use \\
\hline
Search planning & GPT-4.1 & Two authors reviewed all term suggestions & Selected terms entered databases manually \\
Screening triage & ASReview & Reviewers checked all records marked for exclusion or inclusion & AI used for prioritization only \\
Study summaries & GPT-4.1 & One author verified each summary against source PDFs & Corrected summaries used in draft narrative synthesis \\
\hline
\end{tabular}
\caption{Summary of AI-assisted steps in the review workflow.}
\end{table}

Short appendix note in LaTeX

\begin{quote}
AI assistance disclosure: The team used GPT-4.1 for search term brainstorming and draft study summaries, and ASReview for screening prioritization.
Human reviewers verified all outputs against the protocol and source articles.
The tools did not make final inclusion, exclusion, risk-of-bias, or effect-size decisions.
\end{quote}

If you write in LaTeX, our LaTeX tutorial for AI Usage Cards and How to Use AI Usage Cards in Overleaf show how to format this cleanly.

What editors and reviewers will look for

Editors want specificity.

Reviewers want to know whether AI touched judgment-heavy steps and whether humans checked every output that entered the review. That scrutiny gets sharper in health, education, policy, and law, where downstream decisions can affect real people.

Some journal groups already require disclosure of AI use in manuscript preparation. JAMA Network guidance, for example, requires authors to report AI use in manuscript content creation, review, revision, or editing in the Acknowledgment section, with details on the tools used. (jamanetwork.com)

Systematic review journals may ask even sharper questions than general journals because the whole genre depends on method transparency.

For journal-specific context, see AI Disclosure Policies by Major Journals and How to disclose AI use for NeurIPS, ICML, and ACL submissions.

Document it while you still remember it

Do not wait until submission week.

By then, you will forget where AI entered the workflow, what model version you used, or how much human correction the output needed. Good disclosure starts during the review, not after it.

Systematic reviews already demand careful records. AI use belongs in that record.

If you used AI to brainstorm search terms, prioritize records, extract study details, draft evidence tables, summarize findings, or polish prose, write it down now. Then turn that record into a short methods statement and an AI Usage Card that readers, editors, and your future self can trust.

Generate your AI Usage Card now. Use it as a supplement, paste parts into your methods section, or adapt it for acknowledgments. A review without a paper trail is hard to trust. An AI-assisted review without one is harder still.