Technical & Regulatory FAQ
Built for the people accountable for the data.
Detailed answers for biostatisticians and clinical operations leaders — on 21 CFR Part 11 and ALCOA+, CDISC SDTM/ADaM and Define-XML, data provenance, and the human oversight behind every AI-generated table, figure, and listing.
Technical & Regulatory
Compliance, standards, and oversight — in depth.
The questions statistical programming, biometrics, and regulatory teams ask when they evaluate running submission-grade work on Astraea.
21 CFR Part 11 & Data Integrity
How does Astraea satisfy 21 CFR Part 11 for the records it generates?
Astraea treats Part 11 as a design constraint on every record and signature FDA relies on. For the closed system your team operates, that means computer-generated, time-stamped audit trails; role-based access and authority checks; validated system behavior scaled to risk; secure record retention and retrieval; and electronic signatures linked to their records. Because Astraea runs inside your environment, the predicate-rule records it produces (SDTM/ADaM, TFLs, Define-XML, submission documents) inherit those controls rather than requiring a separate compliance layer bolted on afterward.
Do Astraea's outputs meet ALCOA+ data integrity expectations?
Yes — ALCOA+ is the lens we design to. Every derived value is Attributable (tied to a user or agent action), Legible and permanent, Contemporaneous (logged at the moment of execution), Original (source lineage preserved), and Accurate (checked against edit checks and human review). The 'plus' attributes — Complete, Consistent, Enduring, and Available — follow from a versioned, retained audit trail that you can extract on demand during monitoring or inspection.
What exactly does the audit trail capture, and can it be altered?
The audit trail captures the who, what, when, and prior-value for creation, modification, and deletion of critical data — including AI-proposed actions and the human decision that accepted, corrected, or rejected them. Entries are time-stamped against a controlled system clock, attributable to an identified individual or agent, and write-protected so they cannot be edited or overwritten after the fact. It is designed to be reviewable and exportable, not reconstructed manually.
How are electronic signatures implemented?
Signed electronic records carry the signer's printed name, the date and time of signing, and the meaning of the signature (authorship, review, or approval), per 21 CFR 11.50. Signatures are cryptographically bound to their records under 11.70 so they cannot be excised, copied, or transferred to falsify another record, and non-biometric signatures use at least two distinct identification components consistent with 11.100/11.200.
Does FDA's enforcement-discretion and risk-based stance change your obligations?
FDA's 2003 Scope and Application guidance narrows Part 11 to records required under a predicate rule and encourages proportionate, risk-based controls; the 2024 draft Q&A on electronic systems in clinical investigations reinforces a risk-based approach to validation and vendor oversight. Astraea is built to that interpretation — controls are strongest on the records and functions that most affect data integrity and patient safety, while the underlying predicate-rule requirements (GCP, submission integrity) are always met.
CDISC Standards — SDTM, ADaM & Define-XML
How does Astraea generate SDTM-conformant domains?
Astraea's standards-mapping agents map collected study data to SDTM domains using the appropriate SDTMIG version and CDISC Controlled Terminology, respecting domain structure, variable roles, and required/expected/permissible variables. Mappings are proposed with their rationale and confirmed by your programmers, and the platform is designed so datasets pass conformance checks (the same class of rules Pinnacle 21 / CDISC CORE apply) before they move downstream.
How are ADaM datasets derived, and is traceability preserved?
ADaM datasets are derived from SDTM inputs and the statistical analysis plan following ADaM principles — analysis-ready structure (ADSL, BDS, OCCDS), clearly documented derivations, and metadata-driven traceability back to SDTM. Astraea preserves the SDTM-to-ADaM lineage variable by variable so a reviewer (or a regulator) can trace any analysis value to its source, which is the core of ADaM's traceability requirement.
Does Astraea produce Define-XML and the reviewer's guides?
Yes. Astraea generates the submission metadata that travels with your datasets — Define-XML v2.x describing datasets, variables, controlled terms, value-level metadata, and derivations, alongside dataset-level documentation and the accompanying data reviewer's guide content. Define-XML is FDA- and PMDA-required for every study in a submission, so it is produced as a first-class output rather than a manual afterthought.
Which CDISC and controlled-terminology versions do you support?
Astraea is version-aware: SDTMIG/ADaMIG versions, CDISC Controlled Terminology releases, and Define-XML versions are selected per study to match the standards your submission targets under FDA's Data Standards Catalog. Because standards evolve on a fixed cadence, the mapping and validation logic is configurable to the versions your regulatory strategy specifies.
Can Astraea reconcile legacy or non-standard source data?
Yes. Real-world source data is rarely pristine. Astraea's annotation and standards agents reconcile heterogeneous, legacy, and non-standard formats into CDISC-conformant structures, surfacing ambiguous mappings for human adjudication. Interpretation-heavy decisions are routed to your team rather than silently resolved, and every reconciliation is captured in the audit trail.
Data Provenance & Traceability
How does Astraea maintain end-to-end data provenance?
Every transformation — from raw source through SDTM, ADaM, and into TFLs — is recorded as a linked, versioned lineage. Given any figure in a table or any value in an analysis dataset, you can trace backward through the derivation chain to the SDTM record and the original source, and forward to every artifact that consumed it. Provenance is a structural property of the pipeline, not documentation assembled at the end.
If a source value changes, how do downstream artifacts stay consistent?
Because dependencies are tracked, a change to an upstream value flags every downstream dataset, table, and document that derives from it. Re-derivation is reproducible and the change — including who made it and why — is versioned in the audit trail, so you avoid the silent inconsistency that manual, spreadsheet-driven pipelines are prone to during database updates or late data cuts.
Is the pipeline reproducible for an inspector or an independent reviewer?
Yes. Derivations are deterministic and versioned against specific inputs, SAP logic, and standards versions, so a given dataset or output can be regenerated and independently reconciled. Reproducibility plus preserved lineage is what lets a reviewer confirm that what was submitted is exactly what the source data and the analysis plan support.
Human Oversight of AI-Generated TFLs
Who is accountable for AI-generated tables, figures, and listings?
Your qualified team is. Astraea is a clinical co-pilot, not a replacement for regulated roles. The platform proposes shells, programs, and outputs and makes them fully auditable, but statistical sign-off and regulatory accountability remain with your biostatisticians and programmers — exactly where your SOPs and regulatory obligations require them.
What does human-in-the-loop mean specifically for TFL production?
AI executes the heavy, repetitive work — building TFL shells from the SAP, generating analysis programs, and producing draft outputs — while experts retain authority over every decision that affects a result. Reviewers validate shell-to-SAP alignment, statistical methods, and output accuracy; the platform records each acceptance, correction, or rejection so the human decision is part of the permanent record.
How do you prevent unreviewed AI output from reaching a submission?
Critical outputs cannot advance on machine confidence alone. Automated edit checks and conformance rules run first, then a required human validation gate must be cleared before an output is accepted. Uncertain or ambiguous cases are escalated for review rather than pushed through silently, and the review state is enforced by the workflow, not left to convention.
What is behind the 99%+ precision figure, and how should we read it?
It refers to validated outputs — results that have passed both automated checks and human quality control — not raw, unreviewed model output. The figure reflects a system designed with biostatisticians and clinical programmers, grounded in real biometrics work, where AI accelerates execution and experts remain the final quality gate. We are deliberate about not overstating autonomy: the number describes reviewed deliverables.
How does oversight get more reliable over time rather than more opaque?
Every action is logged and versioned, so reviewers can always see what the system did and why, correct it, and have that correction captured. That feedback is visible and auditable rather than hidden inside a black box — the workflow is engineered so transparency and reviewability increase with use, which is the opposite of opaque automation.
Validation, Deployment & Security
How is Astraea validated as a computerized system?
Astraea follows a risk-based approach to computerized system validation aligned with GAMP 5, FDA's Part 11 Scope and Application guidance, and the direction of FDA's Computer Software Assurance (CSA) thinking. Validation effort concentrates on functions with the greatest impact on data integrity and patient safety, using critical thinking and appropriate scripted and unscripted testing. Intended use, controls, and testing evidence are documented so validation can be demonstrated during an inspection.
How is validation handled when Astraea is deployed inside our environment?
Astraea is installed and operated within your infrastructure by forward-deployed engineers working alongside your team, so validation is performed against your qualified environment and your intended use. That keeps the sponsor firmly in control of the validated state, change control, and the documentation your quality unit maintains — consistent with FDA's expectation that the regulated entity owns oversight of the systems holding trial records.
Does Astraea ever see or hold our patient data?
No. Astraea is software your team runs inside your own environment — it is not a web-hosted service that ingests your data, and it is not a CRO that runs the work for you. Your proprietary study data and PHI stay within your security boundary, under your access controls and data-residency requirements. This in-environment model is a core security differentiator versus cloud-hosted platforms.
Is our data ever used to train shared models?
No. Your study data is used to operate Astraea for your trials only — never to train shared or third-party models. It remains isolated to your environment and governed by your access controls and agreements.
How does Astraea align with HIPAA, GDPR, and vendor oversight?
Astraea is built around HIPAA and GDPR alongside FDA guidance: safeguards for PHI, encryption in transit and at rest, role-based access, and support for data-subject rights. Where infrastructure or IT service providers are involved, those relationships carry Part 11 expectations forward — accurate and complete records, access controls, audit trails, and confidentiality — because the sponsor's regulatory responsibility extends to the systems that hold trial records.
Want to go deeper with our team?
Talk to the biostatisticians and clinical programmers behind Astraea. We'll walk through validation, standards conformance, and audit-readiness against your specific SOPs and pipeline.