The Ops Community ⚙️

Cover image for FinOps for AI
Eyal Estrin
Eyal Estrin

Posted on • Originally published at Medium

FinOps for AI

Today, we hear about so many organizations (from small start-ups to large enterprises) experimenting with GenAI applications, adding GenAI components to their existing workloads, and perhaps even moving from evaluation to production.

The increased usage of GenAI services requires organizations to pay attention to the cost of using GenAI services before the high and unpredictable cost generates additional failed projects.

In this blog post, I will share some common recommendations for implementing FinOps practices as part of GenAI workloads.

Real-Time Cost Visibility, Allocation, Tagging, and Accountability

Lack of real-time visibility into cloud costs makes it difficult for organizations to track spending, identify waste, and assign accountability. Without clear, up-to-date cost allocation tied to projects or teams, overspending and inefficiencies often go unnoticed. Building transparent cost tracking and tagging practices empowers teams to monitor expenses continuously, optimize usage, and align spending with business goals.

Recommendations / Best practices

Rightsizing and Resource Optimization

Rightsizing and resource Optimization ensure cloud resources are appropriately sized and efficiently used by continuously analyzing usage patterns and adjusting capacity to eliminate waste and meet actual demand, thereby reducing costs without compromising performance.

Recommendations / Best practices

  • Choose Optimal Model and Inference Types: Select foundation models and inference methods that precisely match your business needs to avoid paying for unnecessary capacity. Continuously evaluate workload requirements and prefer smaller, purpose-fit models over default larger ones to save costs. Reference: Generative AI Cost Optimization Strategies
  • Batching and Concurrency: Efficiently batch inference requests and manage concurrency to maximize instance utilization and reduce cost per token or operation. Reference: GenAI Cost Optimization: The Essential Guide
  • Right-Sizing and Model Selection: Regularly right-size infrastructure—compute, memory, GPU—to workload demand, using autoscaling, spot, and reserved instances to balance cost and performance. Avoid defaulting to high-end hardware for all workloads. References: Identify your savings potential in Azure, Optimizing GenAI Usage.
  • Leverage Cloud-Specific Cost Management Tools: Use cloud vendor cost management and advisory tools to identify and implement cost-saving recommendations. Common services: AWS Compute Optimizer, Azure Advisor, Google Recommendations.

Intelligent Pricing Strategies: Reserved, Spot, and Preemptible Instances

Reserved instances offer significant discounts for long-term, steady workloads by committing to a specific resource usage over one to three years, helping reduce costs compared to pay-as-you-go pricing. Spot and preemptible instances allow access to spare cloud capacity at substantially lower prices but with the risk of interruption, ideal for flexible or fault-tolerant tasks. Balancing these options with real-time workload needs enables cost-efficient cloud resource management while maintaining scalability and performance.

Recommendations / Best practices

Automation and Dynamic Scaling

Automation and dynamic scaling enable cloud resources to automatically adjust in real time to changing workload demands, ensuring efficient performance during peak times while minimizing costs by scaling down when demand is low. This approach reduces manual intervention, optimizes resource use, improves reliability, and supports business agility by maintaining responsiveness under fluctuating traffic conditions.

Recommendations / Best practices

Cost-Aware Model and Workflow Design

Adopting a cost-aware approach to model and workflow design ensures financial insights are embedded in every step of the development lifecycle. By prioritizing real-time cost visibility, proactive forecasting, and iterative policy refinement, teams can anticipate spend early, align resource usage with business intent, and implement rapid adjustments as requirements evolve. This mindset promotes conscious decision-making, enabling organizations to balance performance and efficiency from the ground up.

Recommendations / Best practices

  • Optimize prompt design and token usage: Design applications with cost-aware prompting by minimizing prompt size and engineering efficient prompts. This reduces model invocations and token consumption, directly controlling costs. References: Generative AI Lens - Cost Optimization, Effect of Optimization on AI Forecasting.
  • Use prompt routing, caching, and inference Optimization: Route requests to the most cost-effective models and cache frequent prompts to reduce expensive token processing. This approach can cut inference costs by 40-70%, according to FinOps guidance. Target inference workloads for Optimization since they account for 80-90% of GenAI spending. Reference: Optimizing GenAI Usage
  • Monitor and apply governance per FinOps best practices: Incorporate real-time cost monitoring, forecasting, and governance aligned with FinOps principles to drive iterative cost improvements during the AI model lifecycle. Reference: Effect of Optimization on AI Forecasting

Quotas, Monitoring, and Anomaly Detection

Monitoring quotas and detecting anomalies with alerts ensures cloud resources are managed proactively. Setting alerts before limits are reached helps prevent service disruptions and enables timely capacity planning. This practice keeps cloud workloads reliable and cost-effective across environments.

Recommendations / Best practices

Storage and Data Lifecycle Management

Efficient storage and data lifecycle management are key to controlling cloud costs. Implementing automated lifecycle policies helps transition data across storage tiers based on access patterns and retention needs, while regularly auditing for orphaned or stale data prevents unnecessary spending. Embedding these practices early in the provisioning process ensures cost Optimization throughout the data lifecycle.

Recommendations / Best practices

Team Enablement, Training, and Cost Ownership

Empowering teams with clear cost ownership and targeted training fosters accountability and cost-conscious decision-making. Embedding cost awareness into daily workflows and providing role-specific education helps teams balance innovation and budget, driving a culture of shared responsibility for cloud spending.

Recommendations / Best practices

Forecasting, Budgeting, and Predictive Insights

Accurate forecasting, budgeting, and predictive insights enable organizations to anticipate cloud costs, align spending with business goals, and prevent budget overruns. Leveraging historical data, driver-based forecasting, and machine learning models helps create dynamic, actionable forecasts that drive financial accountability and proactive cost management.

Recommendations / Best practices

Governance, Policy, and Tooling Automation

Automating governance policies ensures consistent compliance, security, and cost control in the cloud. By embedding policies into infrastructure workflows and deployment pipelines, organizations reduce manual errors and enforce rules proactively. This approach enables scalable, reliable oversight and quick remediation across diverse cloud environments.

Recommendations / Best practices

Summary

In this long blog post, I have shared recommendations from various aspects for embedding FinOps practices as part of the design, deployment and maintenance of modern applications containing GenAI services.

Any organization must have proper design and visibility into the cost aspects of any application using GenAI components to avoid high cost, or at least be able to track expected costs as soon as possible.

I encourage the readers to review the hyper-scale cloud providers' documentation, understand service cost, and learn about best practices for cost Optimization.

I also encourage the readers to learn from the FinOps Foundation's official documentation and best practices as they deploy GenAI services.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

Additional references

About the author

Eyal Estrin is a seasoned cloud and information security architect, AWS Community Builder, and author of Cloud Security Handbook and Security for Cloud Native Applications. With over 25 years of experience in the IT industry, he brings deep expertise to his work.

Connect with Eyal on social media: https://linktr.ee/eyalestrin.

The opinions expressed here are his own and do not reflect those of his employer.

Top comments (0)