Frequently Asked Questions

An air-gapped system is physically and logically isolated from the public internet. In our deployments, this means the local high-performance compute nodes (DGX Spark, Mac Studio, or RTX arrays) process all data locally. The system has no Ethernet connection to external networks, no Wi-Fi cards active, and no data leaves the physical premises. Ingestion of data happens via secure, audited internal local area networks (LANs) or designated transfer media.

We deploy models on either Apple Silicon Mac Studio clusters (M-Series Ultra) or dedicated local Nvidia DGX / RTX GPU servers. Mac Studios are highly recommended for mid-market clients due to their massive unified memory densities (up to 192GB VRAM per node), which allows running large 70B parameter models at exceptional electrical and cost efficiencies. Nvidia nodes are used for larger multi-user enterprise workloads requiring high FP16 throughput.

As a licensed Value-Added Reseller (VAR), SAS procures compute hardware directly from manufacturers and certified secure distribution channels to mitigate supply chain risk and eliminate exposure to third-party intermediaries. We also custom design, build, and harden bespoke local compute enclaves tailored to specific regulatory and threat model requirements. While clients maintain direct physical ownership of their silicon, we manage the secure acquisition, validation, assembly, and on-premises configuration during Phase 2.

As of 2026, state-of-the-art open-weights models (such as Llama-3 70B and Mistral Large) perform on par with or exceed commercial closed-source APIs for specialized business tasks. By leveraging custom local fine-tuning and highly optimized semantic retrieval (RAG) pipelines, we can tailor a local model to your organization’s documents, terminology, and workflows, outperforming a generic public model while maintaining 100% privacy.

Cloud AI models violate data residency mandates because prompts, files, and personal data (PII/CUI) are transmitted over the web to third-party servers. By moving the inference model directly onto your local physical hardware inside your secure on-premise network boundaries, no data ever leaves your control. This allows you to deploy advanced AI automation while maintaining absolute compliance, as the AI operations fall entirely within your pre-existing, audited physical and local cybersecurity parameters.

Under our Phase 3: Secure Maintenance Retainer, SAS engineers perform scheduled update procedures. We download new open-weights model weights and software patches onto secure, encrypted transfer media, bring them on-site, scan them for malware within a sandbox, and then install them directly onto your local offline hardware. This ensures your systems are always running the latest, most capable, and most secure models.

For air-gapped installations, model updates are executed physically. SAS security engineers transfer verified open-weights updates and software patches to encrypted physical media, transport them to your secure facilities, run a malware scan within a sandboxed environment, and then mount the media directly to your local offline servers. This maintains physical air-gap security while keeping models updated.

Apple Silicon (M-Series Ultra) offers high memory bandwidth (~800 GB/s), processing a 70B parameter model at 15–20 tokens/second, which is ideal for small to mid-sized teams. Nvidia DGX nodes with H100 or RTX ADA GPUs utilize dedicated HBM/GDDR memory with throughputs exceeding 2000 GB/s, enabling speeds of 50–80 tokens/second per user, making them suitable for large-scale, concurrent enterprise workloads.

Yes. During Phase 2, we can configure local fine-tuning pipelines (LoRA/QLoRA) directly on your hardware cluster. This allows you to train the open-weights models on your internal documents, schemas, and proprietary knowledge bases locally. All training calculations, weights modifications, and training data remain entirely within your secure facility boundary.

Have Specific Architecture Questions?