CASE STUDY
A Large Global Engineering and Technology Company Saves Over Half a Million Dollars and Optimizes GPU Usage with Stacklet
$500K
In Cloud Saving In First Few Months
GPU Visibility
Beyond VMs – Deeper Insights into GPU Optimization
Client Name
Anonymous
Industry
Engineering & Technology
Headquarters
N/A
The Challenge
The autonomous driving division of a large global engineering and technology organization rapidly expanded its adoption of public cloud infrastructure, particularly Microsoft Azure, to leverage cloud-based GPUs, storage, and other services for its AI-powered applications. While they had initially developed an in-house tool to manage and optimize cloud resources, their operations’ increasing complexity and scale prompted them to seek an industry-standard solution that could provide better visibility, automation, and governance over their cloud usage. Their key requirements included:
- Deeper Usage Insights with Advanced Metrics: Managing AI-driven infrastructure comes with high costs, particularly for GPU and storage resources. They needed granular visibility into resource utilization to identify underutilization, over-provisioning, and inefficiencies. The goal was to educate internal teams and leverage data-driven insights to optimize spending and do better planning in collaboration with their cloud provider.
- Automated Cost and Usage Optimization: They required an automated process to visualize cloud resource usage, detect inefficiencies, and integrate remediation workflows. Processes included automated notifications (via Jira and email) to application owners and engineering teams, ensuring they proactively addressed optimization opportunities without manual intervention.
- Cloud Compliance and Governance: The company wanted to align its cloud operations with industry best practices and regulatory benchmarks, including NIST and cloud provider-recommended compliance frameworks. Ensuring adherence to security and operational standards was a key priority.
- Preventative Guardrails for Cost Control: Beyond visibility and reactive measures, they sought proactive guardrails that could enforce best practices across build and runtime environments. Their goal was to eliminate recurring inefficiencies, improve operational resilience, and prevent unnecessary cloud costs before they occurred.
“We realized that our existing homegrown tools, while providing some optimization recommendations, would always lag in terms of the functionality we needed to build. We wanted a comprehensive solution that could offer deeper insights into actual GPU utilization and cover broader cloud optimization needs, ” Cloud Architect.
Solution
After a rigorous Proof of Value (POV) process, the organization selected Stacklet as the optimal solution. During the POV, Stacklet conducted a tagging analysis, uncovered thousands of dollars in optimization opportunities, and mapped their security posture against compliance benchmarks like CIS and NIST. Stacklet met all their key requirements, delivering deep visibility, automation, and governance to streamline and optimize their cloud operations.
“Stacklet stood out as the most flexible solution on the market, backed by a strong open source community, and met all our requirements. It provides out-of-the-box policies and insights while allowing us to customize as needed. On top of that, Stacklet’s resource-centric pricing model was also appealing to us.”
Outcome
- More Than Half a Million Dollars in Cloud Savings Within the First Few Months: With Stacklet, the organization quickly identified and acted on thousands of dollars in cloud optimization opportunities, leading to over half a million in savings within the first few months. Stacklet AssetDB provided real-time visibility into cloud resources, while policy-based recommendations enabled targeted cost optimizations.
- Deeper Visibility into GPUs and Comprehensive Cloud Coverage: With Stacklet, the organization now has a single tool that provides visibility into GPU metrics, virtual machines, and storage, enabling a holistic view of their cloud infrastructure.
“Previously, we had visibility into virtual machines but not GPUs, which comprise a significant portion of our monthly cloud spend. This meant we weren’t optimizing a large part of our spending. Stacklet now gives us this visibility and will allow us to optimize GPU usage by taking targeted action.”
This new level of insight lays the foundation for future optimizations, ensuring the organization can maximize efficiency and cost savings as they fully implement these capabilities.
- Improved Tagging Compliance and Visibility: With Stacklet, the organization gained immediate visibility into their cloud resource tagging compliance, allowing them to assess tagging quality, coverage, and inconsistencies across their environment. Stacklet’s tagging insights surfaced issues such as bad tag variations and incomplete metadata, providing a clear picture of tag values and overall compliance. This newfound visibility has helped the organization identify gaps and standardize tagging practices, setting the stage for stronger governance and cost tracking.
- Enhanced Cloud Compliance and Policy Enforcement: With Stacklet, the organization now has real-time visibility into cloud compliance gaps, enabling it to assess misconfigurations, policy violations, and security risks across its environment. Stacklet’s compliance insights help them align with industry standards such as CIS, NIST, and cloud provider best practices, ensuring a more assertive security posture. As a next step, they plan to enforce compliance policies using Stacklet, shifting from visibility to proactive governance, reducing risk, and ensuring continuous compliance at scale.