Trust and security

How Data Shepherd handles your data

The AI builds your transformation once. After that, a sealed script does the work, with nothing sent to an AI model on a normal run. Here is exactly what that means, what we send to a model and when, and who we rely on behind the scenes.

The guarantee

The AI builds it. A sealed script runs it.

Most AI for data tools push your records through a language model on every run. We keep the model in the build phase only, so production never sends your data to a model.

Build, one time

When you create a transformation, the model sees your instructions and a sample you provide, then writes a transparent Python script you can review line by line. You control what goes in that sample.

Run, every time after

A saved transformation runs as a sealed script in an isolated sandbox with no network and no AI. Your full dataset is processed in our infrastructure and never reaches a model.

The zero-data option

Cannot share even a sample? Build from a data dictionary or layout spec instead. The model sees field names, types, and formats, never a record. At run-time the result is the same sealed script.

What we send to AI

What reaches the model, and when

A successful run sends nothing to any AI model. The model is involved while you build, and, if a run fails, in optional recovery features you can see below.

When	What reaches the AI model
Building or editing a transformationSends to model	Your instructions, a sample you provide, and any spec or schema you choose to upload.
Running a saved transformationNo model involved	Nothing. The sealed script runs in an isolated sandbox with no AI and no network. Your full dataset is processed in our infrastructure and never sent to a model.
A run fails, plain-English diagnosisSends to model	The error and the transformation context (not your full dataset), so the model can explain what went wrong.
A run fails, auto-heal (Pro and above)Sends to model	The failing script and a small sample of the failing rows, so the model can propose a fix. Off by default: you enable it per transformation, and it is rate-limited and capped per job. Leave it off if you never want failing data seen by a model.

Sub-processors

Who we rely on

The third parties that help run Data Shepherd, and what each one handles.

Provider	What they handle
Anthropic (Claude)	Writes the transformation scripts during the build phase, and, only on a failed run, powers optional diagnosis and auto-heal. API inputs and outputs are not used for model training.
Microsoft Azure	Hosting, database, file storage, and secret management. Production runs in a United States region.
Stripe	Subscription billing. Card details are handled by Stripe; we do not store payment card numbers.
Resend	Transactional email, such as sign-in links and job notifications.

Safeguards

How we protect your data

Scripts run server-side

You can review your transformation script read-only, but your browser can never hand a script back to run. Execution always uses the server-stored, integrity-checked, security-screened code.

Sandboxed execution

Every script is screened by a security validator that blocks network, filesystem, and process access, then runs in an isolated executor with no outbound network.

Encryption and secrets

Connector credentials are encrypted before they are stored, production secrets live in Azure Key Vault, and all traffic runs over HTTPS.

Least-privilege access

Sign-in uses short-lived tokens in HttpOnly cookies. API keys can only run transformations and read results, never create, delete, or touch billing.

Soft deletes

Records are deactivated rather than hard-deleted, which protects against accidental loss while keeping deleted items out of normal use.

Audit trail

Sensitive actions are written to an audit log, retained for one year, so account activity can be reviewed.

Hard limits

What we cannot do

These hold because of how the system is built, not because a policy says so.

We cannot run code your browser sends

Execution always uses the server-stored script, checked against its SHA-256 hash and security-screened before it runs. There is no path that executes client-supplied code.

A running script cannot reach the internet

Scripts are screened for network, filesystem, and process access, then run in a sandbox with no outbound network. Nothing inside a run can call out, to an AI model or anywhere else.

A job cannot read files outside its own organization

Each run receives access scoped to its own input and output only, and every query is scoped to your organization at the database layer.

We cannot silently change your scripts

Every change creates a new version with full history, and each run records which version executed. What you reviewed is what runs.

Data retention

How long we keep things

Cleanup runs automatically every day. Exact windows can change as the product evolves.

Output files

Auto-deleted on a plan-based schedule, from 30 days on Free up to 365 days on Scale (90 days on Pro, 180 on Business).

Generated script drafts

Expire within 24 hours and are removed within 30 days.

Sign-in links

Valid for 24 hours, then deleted.

Audit logs

Retained for one year.

Compliance

Where we stand

We are a young company, and we would rather show you exactly how the system works than wave a logo. The architecture above is live today; formal attestations are in progress.

Audit logging

Live today

HIPAA BAA

Available now, self-serve, Pro plan and up

SOC 2 Type II

In progress

Questions from your security team?

We are happy to walk through the data flow, share a sub-processor list, or answer a security questionnaire. Email support@datashepherd.ai.