Curated datasets and prompts for
real businesses — built with **guardrails**
We collect, refine, and package high-quality datasets, generate prompt libraries, and train models with strict privacy and safety controls. Designed for SMEs and startup teams who need pragmatic AI—not research projects.
What’s included
- **Curated datasets** with provenance notes and versioning
- **Prompt packs** by vertical + evaluation suites
- **PII scrubbing** & safety guardrails in every pipeline
- **LLM training** options (LoRA / instruction-tune) + RAG
Currency: All prices listed in USD.
High-resolution service blocks
Data Collection
Ethical web collection, partner feeds, and customer uploads. Scheduler + retrievers with adaptive rate limits and failover. All sources tracked for provenance and license.
Refining & Labeling
Deduplication, language detection, PII scrubbing, toxicity filters, topic/intent tagging, and human-in-the-loop QA for “gold” samples.
Prompt Generation
Bilingual prompt packs per vertical (marketing, support, real estate, retail). Includes style, tone, brand-safety and response-length controls.
Dataset Sales
Single-purchase industry packs with clear licensing and update cadence. Optional RAG-ready chunking + embeddings export.
LLM Training
Instruction-tuning & LoRA on curated sets. Metric tracking (helpfulness, safety, grounding), eval suites, and rollback to prior versions.
Deployment Support
RAG APIs, retrieval schemas, and guardrail middleware. We help you wire it into apps or agents, then monitor drift and quality.
Monthly plans (USD)
- Up to **25 GB** curated data processing
- Prompt pack: **1** vertical
- 1 eval suite & basic guardrails
- Email support
- Up to **100 GB** curated data processing
- Prompt packs: **3** verticals
- 2 eval suites, advanced guardrails
- Slack support + quarterly review
- Up to **500 GB** curated data processing
- All prompt packs (library access)
- Full eval + red-team runs
- Training advisory (LoRA planning)
- 1+ TB / special licensing
- Private datasets & SLAs
- Dedicated infra / VPC
- On-prem or sovereign options
Fair-use: GB refers to pre-compression input volume processed through our refinement pipelines within the month. Overages billed at plan rate.
Pipeline overview
Privacy & safety by design
PII Handling
Automatic detection/removal of emails, phones, addresses, IDs, faces (if media present), plus optional hashing/tokenization. PII never included in deliverables.
Consent & Licensing
Only public/partner data with documented terms. We record provenance, timestamps, license notes, and block disallowed reuse.
Safety & Brand
Toxicity, bias & jailbreak screens; category allow/deny; Spanish/MX tone controls. Red-team prompts baked into QA runs.
Storage & Access
Encrypted at rest (cloud buckets) and in transit (TLS). Signed links or VPC peering. Access is least-privilege.
Compliance
We align with common principles (GDPR/CCPA-style data minimization & purpose limits). Not legal advice—DPA available on request.
Auditability
Dataset cards list sources, filters, and known caveats. Versioned releases with changelogs for reproducibility.
Integrations for builders
APIs
REST endpoints for dataset manifests, signed downloads, and RAG retrieval. Optional embeddings export (FAISS/PGV/Cloud). Webhooks for update events.
Models
We support instruction-tuning & LoRA on top models; RAG stacks with bilingual retrievers; and guardrail middleware for prompts & responses.
Transparent USD pricing
Dataset Packs (one-time)
Industry micro-pack **$49 USD**, standard pack **$199 USD**, pro pack **$499 USD**. Includes license & provenance notes; optional embeddings +$49.
Training (add-on)
LoRA planning + supervised runs from **$1,500 USD** per job (excl. cloud compute). Metrics, evals, rollback, and model card included.
Implementation
RAG wiring & guardrail setup from **$120 USD/hr**. Fixed-bid available after a short scoping call.
All currency is USD. Taxes, cloud costs, and marketplace fees billed at actuals where applicable.