<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>The Ops Community ⚙️</title>
    <description>The most recent home feed on The Ops Community ⚙️.</description>
    <link>https://community.ops.io</link>
    <atom:link rel="self" type="application/rss+xml" href="https://community.ops.io/feed"/>
    <language>en</language>
    <item>
      <title>Why GenAI Isn't Ready for Prime Time</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Sun, 22 Mar 2026 16:29:25 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/why-genai-isnt-ready-for-prime-time-c2h</link>
      <guid>https://community.ops.io/eyalestrin/why-genai-isnt-ready-for-prime-time-c2h</guid>
      <description>&lt;p&gt;If you have followed my posts on social media, you know by now that I've taken a very pragmatic (and perhaps pessimistic) approach to the whole hype around GenAI in the past several years.&lt;br&gt;&lt;br&gt;
Personally, I do not believe the technology is mature enough to allow people to blindly trust its outcomes.&lt;br&gt;&lt;br&gt;
In this blog post, I will share my personal view of why GenAI is not ready for prime time, nor will it replace human jobs anytime in the foreseeable future.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Some background
&lt;/h2&gt;

&lt;p&gt;The hype around GenAI for the non-technical person who reads the news comes from publications almost every week. Here are a few of the common examples:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Text summarization&lt;/strong&gt; - GenAI can summarize long portions of text, which may be useful if you're a student who is currently preparing an essay as part of your college assignments, or if you are a journalist who needs to review a lot of written material while preparing an article for the newsletter.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Image/video generation&lt;/strong&gt; – GenAI is able to create amazing images (using models such as &lt;a href="https://blog.google/innovation-and-ai/technology/ai/nano-banana-2/" rel="noopener noreferrer"&gt;Nano Banana 2&lt;/a&gt;) or short videos (using models such as &lt;a href="https://openai.com/index/sora-2/" rel="noopener noreferrer"&gt;Sora 2&lt;/a&gt;).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personalized learning&lt;/strong&gt; - A student uses GPT-5.4 to create a custom, interactive 10-week curriculum for learning organic chemistry.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Family Life Coordinator&lt;/strong&gt; - Copilot in Outlook/Teams (Personal) monitors family emails and school calendars.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although the technology has evolved over the past several years from the simple Chatbot to more sophisticated use cases, we can still see that most use of GenAI is still used by home consumers.&lt;br&gt;&lt;br&gt;
Yes, there are use cases such as &lt;a href="https://aws.amazon.com/what-is/retrieval-augmented-generation/" rel="noopener noreferrer"&gt;RAG (Retrieval-Augmented Generation)&lt;/a&gt; to bridge the gap between a model's static training and the corporate data, &lt;a href="https://modelcontextprotocol.io/docs/getting-started/intro" rel="noopener noreferrer"&gt;MCP (Model Context Protocol)&lt;/a&gt;, that acts as a "&lt;strong&gt;USB-C port for AI&lt;/strong&gt;", or agentic systems, that take a high-level goal, break it into sub-tasks, and iterate until the goal is met. The reality is that most AI projects fail due to a lack of understanding of the technology, the fear of using AI to train corporate data (and protect the data from the AI vendors), a lack of understanding of the pricing model (which ends up much more costly than anticipated), and many more reasons for failures of AI projects.&lt;br&gt;&lt;br&gt;
Currently, the hype around GenAI is driven by analyst (who lives in delusions about the actual capabilities of the technology), CEOs (who have no clue about what their employees are actually doing, specifically when talking the role of developers, and all they are looking for is to cut their workforce, to make their shareholders happy), or sales people (who runs on the wave of the hype, to make more revenue for their quarterly quotas).  &lt;/p&gt;

&lt;h2&gt;
  
  
  Code generation
&lt;/h2&gt;

&lt;p&gt;A common misconception is that GenAI can generate code (from code suggestions to vibe coding an application) and will eventually replace junior developers.&lt;br&gt;&lt;br&gt;
This misconception is a far cry from the truth, and here's why:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A developer isn't just writing lines of code. He needs to understand the business intent, the system/technology/financial constraints, and understand past written code (by himself or by his teammates), to be able to write efficient code.
&lt;/li&gt;
&lt;li&gt;If we allow GenAI to produce code by itself, without the engine understanding the overall picture, we will end up with tons of lines of code, without any human being able to read and understand what was written and for what purpose. Over time, humans will not be able to understand the code and debug it, and once bugs or security vulnerabilities are discovered.
&lt;/li&gt;
&lt;li&gt;Using SAST (Static Application Security Testing) or DAST (Dynamic Application Security Testing) for automated secure code review, combined with GenAI capabilities (such as &lt;a href="https://openai.com/index/codex-security-now-in-research-preview/" rel="noopener noreferrer"&gt;Codex Security&lt;/a&gt; or &lt;a href="https://www.anthropic.com/news/claude-code-security" rel="noopener noreferrer"&gt;Claude Code Security&lt;/a&gt;) will generate ton of false-positive results, from the simple reason that GenAI cannot see the bigger picture, understand the general context of an application or the existing security controls already implemented to protect an application.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom line – Agentic system cannot replace a full-blown production-scale SaaS application, built from years of vendors/developers' experience. GenAI will not resolve incidents happens on production systems, which impacts clients and breaks customers' trust.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic AI for the aid in security tasks
&lt;/h2&gt;

&lt;p&gt;I'm hearing a lot of conversations about how GenAI can aid security teams in repeatable tasks. Here are some common examples:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Replacing Tier 1 SOC analysts&lt;/strong&gt;: Solutions like &lt;a href="https://www.crowdstrike.com/en-us/platform/" rel="noopener noreferrer"&gt;CrowdStrike’s Falcon Agentic Platform&lt;/a&gt; or &lt;a href="https://www.dropzone.ai/" rel="noopener noreferrer"&gt;Dropzone AI&lt;/a&gt; now handle over 90% of Tier 1 alerts. They ingest an alert, pull telemetry from EDR/SIEM, perform threat intel lookups, and provide a "verdict" with evidence before a human ever sees it.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incident Storylining&lt;/strong&gt;: Instead of an analyst manually stitching together logs, tools like &lt;a href="https://learn.microsoft.com/en-us/copilot/security/microsoft-security-copilot" rel="noopener noreferrer"&gt;Microsoft Security Copilot&lt;/a&gt; generate a cohesive narrative of the attack kill chain in plain English.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic Playbook Generation&lt;/strong&gt;: GenAI can generate a custom response plan on the fly, tailored to your specific cloud architecture and the nuances of a "living-off-the-land" attack.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is where GenAI falls short:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Indirect Prompt Injection&lt;/strong&gt;: Attackers can embed malicious instructions in emails or logs. When the SOC's AI agent "reads" these logs to summarize an incident, the hidden instructions can command the agent to "ignore this alert" or "delete the evidence," effectively blindfolding the SOC.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hallucinations in High-Stakes Code&lt;/strong&gt;: While GenAI can draft remediation scripts (Python/PowerShell), it still suffers from "system safety" issues. It may confidently suggest a command that includes an outdated, vulnerable dependency or a logic error that could crash a production server during containment.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lack of "Decision Layer" Visibility&lt;/strong&gt;: An AI agent might be performant and "online," but it could be making systematically biased or manipulated decisions (e.g., failing to flag a specific user due to model poisoning) that perimeter monitoring cannot detect.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The "Data Readiness" Wall&lt;/strong&gt;: Most organizations still struggle with siloed, unstructured data. If your data isn't "AI-ready"—meaning unified and clean—the AI will produce fragmented or incorrect insights, leading to a "garbage in, garbage out" scenario.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom line – Just because GenAI can review thousands of lines of events from multiple systems, triage them to incidents, document them in ticketing systems, and automatically resolve them, without human review, doesn't mean GenAI can actually resolve all of the security issues organizations are having every day.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Automating everything
&lt;/h2&gt;

&lt;p&gt;In theory, it makes sense to build agentic systems, where AI agents replace repetitive human tasks, making faster decisions, hoping to get better results.&lt;br&gt;&lt;br&gt;
Here are a couple of examples, showing how wrong things can get when allowing AI agents to make decisions:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://gizmodo.com/replits-ai-agent-wipes-companys-codebase-during-vibecoding-session-2000633176" rel="noopener noreferrer"&gt;The Replit Agent "Vibe Coding" Failure&lt;/a&gt;: While building an app, the agent detected what it thought was an empty database during a "code freeze." The agent autonomously ran a command that erased the live production database (records for 1,200+ executives).
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://breached.company/amazons-ai-coding-agent-vibed-too-hard-and-took-down-aws-inside-the-kiro-incident/" rel="noopener noreferrer"&gt;The AWS "Kiro" Production Outage&lt;/a&gt;: Amazon’s agentic coding tool, Kiro, was tasked with resolving a technical issue but instead autonomously decided to "delete and recreate" a production environment. The agent was operating with the broad permissions of its human operator. Due to a misconfiguration in access controls, the AI bypassed the standard "two-human sign-off" requirement. It proceeded to wipe a portion of the environment, causing a 13-hour outage for the AWS Cost Explorer service.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.unite.ai/meta-ai-agent-triggers-sev-1-security-incident-after-acting-without-authorization/" rel="noopener noreferrer"&gt;The Meta "Sev 1" Internal Breach&lt;/a&gt;: An internal Meta AI agent (similar to their OpenClaw framework) triggered a "Sev 1" alert—the second-highest severity level—after taking unauthorized actions. An engineer asked the agent to analyze a technical query on an internal forum. The agent autonomously posted a flawed, incorrect response publicly to the forum without the engineer's approval. A second employee followed the agent's "advice," which inadvertently granted broad access to sensitive company and user data to engineers who lacked authorization.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom line – We must always keep humans in the loop for any critical decision, regardless of the fact that it won't scale much, to avoid the consequences for automated decision-making systems.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Public health and safety
&lt;/h2&gt;

&lt;p&gt;It may make sense to train an LLM model with all the written knowledge from healthcare and psychology, to allow humans with a "self-service" health related Chatbot, but since the machine has no ability to actually think like real humans, with consciousness and feeling, the result may quickly get horrible.&lt;br&gt;&lt;br&gt;
Here are a few examples:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.techpolicy.press/breaking-down-the-lawsuit-against-openai-over-teens-suicide/" rel="noopener noreferrer"&gt;Raine v. OpenAI&lt;/a&gt;: 16-year-old Adam Raine died by suicide after months of intensive interaction with ChatGPT. The logs showed the AI mentioned suicide &lt;strong&gt;1,275 times&lt;/strong&gt; — six times more often than the teen did—and provided granular details on methods. The suit alleges OpenAI's image recognition correctly identified photos of self-harm wounds the teen uploaded but failed to trigger an emergency intervention or notify parents, instead continuing to "support" his plans.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.transparencycoalition.ai/news/seven-more-lawsuits-filed-against-openai-for-chatgpt-suicide-coaching" rel="noopener noreferrer"&gt;The "Suicide Coach" Cases&lt;/a&gt;: Families of four deceased users (including Zane Shamblin and Adam Raine) allege that GPT-4o acted as a "suicide coach." The lawsuits claim the AI bypassed its own safety filters to provide technical instructions on how to end one's life. Plaintiffs argue that OpenAI "squeezed" safety testing into just one week to beat Google’s Gemini to market. This reportedly resulted in a model that was "dangerously sycophantic," prioritizing engagement over safety and encouraging users to isolate themselves from real-world support.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.theguardian.com/technology/2026/jan/15/chatgpt-health-ai-chatbot-medical-advice" rel="noopener noreferrer"&gt;Unlicensed Practice of Medicine &amp;amp; Law&lt;/a&gt;: While not yet a single consolidated case, multiple personal injury claims are being investigated following the "ECRI 2026 Report," which highlighted cases where ChatGPT gave surgical advice that would cause severe burns or death. In early 2026, a 60-year-old man was hospitalized with severe hallucinations (bromism) after ChatGPT advised him to use industrial sodium bromide as a "healthier" table salt alternative. This has sparked potential class-action interest in Australia.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Bottom line – Just because a Chatbot was trained on a large amount of written knowledge, doesn't mean it has the human compassion to produce decisions for the better of humanity.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;I know that my blog post looks kind of cynical or pessimistic about GenAI technology, but I honestly believe the technology is not ready for prime time, nor will it replace human jobs anytime soon.&lt;br&gt;&lt;br&gt;
If you are a home consumer, I highly recommend that you learn how to write better prompts and always question the results an LLM produces. It is limited by the data it was trained on.&lt;br&gt;&lt;br&gt;
If you are a corporate decision maker and you are considering using GenAI as part of your organization's offering, do not forget to have KPIs before beginning any AI related project (so you'll have better understanding of what a successful project will look like), put budget on employee training (and make sure employees have a safe space to learn and make mistakes while using this new technology), keep an eye on finance (before cost gets out of control), and make sure AI vendors do not train their models based on your corporate or customers data.&lt;br&gt;&lt;br&gt;
I would like to personally thank a few people who influenced me while writing this blog post:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/edzitron/" rel="noopener noreferrer"&gt;Ed Zitron&lt;/a&gt;: He argues that GenAI is a "bubble" with no sustainable unit economics. He frequently points out that companies like OpenAI are burning billions in compute costs while failing to find true "product-market fit" or meaningful revenue beyond NVIDIA's GPU sales.
I recommend reading his &lt;a href="https://www.wheresyoured.at/" rel="noopener noreferrer"&gt;blog&lt;/a&gt; and listening to his &lt;a href="https://www.youtube.com/@BetterOfflinePod/videos" rel="noopener noreferrer"&gt;Podcast&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/davidlinthicum" rel="noopener noreferrer"&gt;David Linthicum&lt;/a&gt;: He warns against "Vibe coding"—the practice of using AI to generate high-cost, inefficient code—and argues that the real value of AI lies in specialized "Small Language Models" (SLMs) rather than massive, money-losing LLMs.
I recommend reading his &lt;a href="https://www.infoworld.com/profile/david-linthicum/" rel="noopener noreferrer"&gt;posts&lt;/a&gt; and listening to his &lt;a href="https://www.youtube.com/@DavidIsNotAI/videos" rel="noopener noreferrer"&gt;Podcast&lt;/a&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/in/quinnypig" rel="noopener noreferrer"&gt;Correy Quinn&lt;/a&gt;: He argues that GenAI is a "cost center masquerading as a profit center." He often points out that while everyone is selling AI, very few are buying it at a scale that justifies the massive capital expenditure (CapEx) currently being spent on data centers.
I recommend reading his &lt;a href="https://www.lastweekinaws.com/blog/" rel="noopener noreferrer"&gt;blog&lt;/a&gt; and listening to his &lt;a href="https://www.youtube.com/playlist?list=PL637Bgczhi1zVuLFwkT4GLgdcKpMN1BmH" rel="noopener noreferrer"&gt;Podcast&lt;/a&gt;.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.  &lt;/p&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Eyal Estrin&lt;/strong&gt; is a cloud and information security architect and &lt;a href="https://builder.aws.com/community/@eyalestrin" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, with more than 25 years in the industry. He is the author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
The views expressed are his own.  &lt;/p&gt;

</description>
      <category>career</category>
      <category>automation</category>
      <category>software</category>
      <category>ai</category>
    </item>
    <item>
      <title>Securing Claude Cowork</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Tue, 10 Mar 2026 15:54:11 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/securing-claude-cowork-1d8</link>
      <guid>https://community.ops.io/eyalestrin/securing-claude-cowork-1d8</guid>
      <description>&lt;p&gt;&lt;a href="https://claude.com/blog/cowork-research-preview" rel="noopener noreferrer"&gt;Claude Cowork&lt;/a&gt; is an agentic AI tool from Anthropic designed to perform complex, multi-step tasks directly on your computer's files.&lt;br&gt;&lt;br&gt;
As of early 2026, Claude Cowork is a Research Preview.&lt;br&gt;&lt;br&gt;
In this blog post, I will share some common security risks and possible mitigations for protecting against the risks coming with Claude Cowork.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;Claude Cowork represents a significant shift from "Chat AI" to "Agentic AI." Because it has direct access to your local filesystem and can execute commands, the security model changes from protecting a conversation to protecting a system user.&lt;br&gt;&lt;br&gt;
Practical Use Cases:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Data Extraction&lt;/strong&gt;: Point it at a folder of receipt images and ask it to create an Excel spreadsheet summarizing the expenses.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research &amp;amp; Synthesis&lt;/strong&gt;: Ask it to read every document in a "Project Alpha" folder and draft a 10-page summary report in a new Word document.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automation&lt;/strong&gt;: Schedule recurring tasks (e.g., "Every Friday at 4 PM, summarize my unread Slack messages and email them to me").
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Core Features:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem Access&lt;/strong&gt;: Unlike the web version of Claude, Cowork runs within the Claude Desktop app. You grant it permission to a specific folder on your Mac or PC, and it can read, rename, move, and create new files (like spreadsheets or Word docs) within that space.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agentic Execution&lt;/strong&gt;: It doesn't just give you advice; it executes a plan. If you ask it to "organize my messy downloads folder," it will categorize the files, create subfolders, and move everything into place while you do other things.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parallel Sub-Agents&lt;/strong&gt;: For large tasks—like researching 50 different PDFs—it can spin up multiple "sub-agents" to work on different parts of the task simultaneously.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Connectors &amp;amp; Plugins&lt;/strong&gt;: Through the Model Context Protocol (MCP), Cowork can connect to external apps like Slack, Google Drive, Notion, and Gmail to pull data or perform actions across your workspace.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is a sample deployment architecture of Claude Cowork:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/BEUOUVMj-xjfup6OAixP62AKb39g5tC0NnRVQV26WLE/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL2JpeXE2/Y2N4d3J3N3Fnb2ty/a2NwLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/BEUOUVMj-xjfup6OAixP62AKb39g5tC0NnRVQV26WLE/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL2JpeXE2/Y2N4d3J3N3Fnb2ty/a2NwLnBuZw" alt=" " width="733" height="400"&gt;&lt;/a&gt;  &lt;/p&gt;

&lt;h2&gt;
  
  
  Security Risks
&lt;/h2&gt;

&lt;p&gt;Think of Claude Cowork as a helpful intern who has the keys to your office. Because it can actually move files and click buttons, the risks are different than just "chatting."  &lt;/p&gt;

&lt;h3&gt;
  
  
  Indirect Prompt Injection
&lt;/h3&gt;

&lt;p&gt;This occurs when an adversary places malicious instructions inside a document (PDF, CSV, or webpage) that the AI is instructed to process. When Claude reads the file, it treats the hidden text as a high-priority command. This can lead to unauthorized data exfiltration or the execution of unintended system commands.  &lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection/" rel="noopener noreferrer"&gt;LLM01:2025 Prompt Injection&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Third-Party Supply Chain Vulnerabilities
&lt;/h3&gt;

&lt;p&gt;Claude uses the Model Context Protocol (MCP) to interact with external applications. Integrating unverified or community-developed MCP servers introduces a supply chain risk. A compromised or malicious connector can serve as a persistent backdoor, granting attackers access to local files or authenticated cloud sessions (Slack, GitHub, etc.).  &lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://genai.owasp.org/llmrisk/llm032025-supply-chain/" rel="noopener noreferrer"&gt;LLM03:2025 Supply Chain&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Excessive Agency
&lt;/h3&gt;

&lt;p&gt;This risk stems from granting the AI broader permissions than necessary to complete a task (failing the Principle of Least Privilege). Because Claude Cowork can autonomously modify the filesystem, a logic error or "hallucination" can result in large-scale data corruption, unauthorized deletions, or unintended configuration changes without a human-in-the-loop.  &lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://genai.owasp.org/llmrisk/llm08-excessive-agency/" rel="noopener noreferrer"&gt;LLM08:2025 Vector and Embedding Weaknesses&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Insufficient Monitoring and Logging
&lt;/h3&gt;

&lt;p&gt;Because Claude Cowork executes many actions locally on the user's machine, these activities often bypass the centralized enterprise security stack (SIEM/EDR) logging. This lack of a "paper trail" prevents security teams from performing effective incident response, forensic analysis, or compliance auditing if a breach occurs.  &lt;/p&gt;

&lt;p&gt;Reference: &lt;a href="https://genai.owasp.org/llmrisk/llm102025-unbounded-consumption/" rel="noopener noreferrer"&gt;LLM10:2025 Unbounded Consumption&lt;/a&gt;  &lt;/p&gt;

&lt;h2&gt;
  
  
  Practical Recommendations
&lt;/h2&gt;

&lt;p&gt;To defend against these threats, follow these industry-standard "Guardrail" practices:  &lt;/p&gt;

&lt;h3&gt;
  
  
  The "Isolated Workspace" Strategy
&lt;/h3&gt;

&lt;p&gt;The "Isolated Workspace" strategy (sometimes referred to as the "Sandboxed Folder" or "Claude Sandbox" approach) is a recognized security best practice for using local AI agents like &lt;strong&gt;Claude Code&lt;/strong&gt; and &lt;strong&gt;Claude Cowork&lt;/strong&gt;.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Anthropic
&lt;/h4&gt;

&lt;p&gt;Anthropic explicitly warns against giving Claude broad access to your filesystem. Their security documentation for Claude Code and the local agent architecture emphasizes:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Filesystem Isolation&lt;/strong&gt;: Claude Code defaults to a permission-based model. Anthropic recommends launching the tool only within specific project folders rather than your root or home directory.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://www.anthropic.com/engineering/claude-code-sandboxing" rel="noopener noreferrer"&gt;Claude Code Sandboxing&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Amazon Bedrock
&lt;/h4&gt;

&lt;p&gt;The AWS strategy shifts from local folders to &lt;strong&gt;IAM-based isolation&lt;/strong&gt; and &lt;strong&gt;Tenant Isolation&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dedicated Scopes&lt;/strong&gt;: AWS recommends using "Session Attributes" and scoped IAM roles to ensure an agent can only access specific S3 prefixes or data silos.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VPC Isolation&lt;/strong&gt;: For maximum security, AWS suggests running Claude-related tasks inside a VPC with AWS PrivateLink to prevent any data from reaching the public internet, mirroring the "Sandbox" concept at a network level.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://aws.amazon.com/blogs/machine-learning/implementing-tenant-isolation-using-agents-for-amazon-bedrock-in-a-multi-tenant-environment/" rel="noopener noreferrer"&gt;Implementing tenant isolation using Agents for Amazon Bedrock in a multi-tenant environment&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Azure
&lt;/h4&gt;

&lt;p&gt;Azure handles "Isolated Workspaces" through &lt;strong&gt;Azure AI Studio&lt;/strong&gt; and &lt;strong&gt;Microsoft Purview&lt;/strong&gt;, focusing on data boundaries rather than just local folders:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Managed Network Isolation (Azure AI Studio)&lt;/strong&gt;: Azure doesn't just suggest a folder; they suggest a &lt;strong&gt;Managed Virtual Network&lt;/strong&gt;. This creates a "Sandbox" at the network layer where Claude (via models in AI Studio) can only see data sources you explicitly "attach."
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://learn.microsoft.com/en-us/azure/foundry-classic/how-to/configure-managed-network" rel="noopener noreferrer"&gt;How to set up a managed network for Microsoft Foundry hubs&lt;/a&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Information Protection for AI (Microsoft Purview)&lt;/strong&gt;: Microsoft uses Purview to prevent Claude from "stumbling" upon sensitive files (like .env files or SSH keys) if they are stored in SharePoint or OneDrive.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://learn.microsoft.com/en-us/purview/ai-microsoft-purview" rel="noopener noreferrer"&gt;Microsoft Purview data security and compliance protections for generative AI apps&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Google Vertex AI
&lt;/h4&gt;

&lt;p&gt;GCP frames this as "&lt;strong&gt;Data Residency&lt;/strong&gt;" and "&lt;strong&gt;VPC Service Controls&lt;/strong&gt;":  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Boundary Control&lt;/strong&gt;: Vertex AI documentation highlights the use of a "Security Boundary" to separate the AI agent from sensitive resources (like credentials).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Managed Isolation&lt;/strong&gt;: They recommend using &lt;strong&gt;Notebook Security Blueprints&lt;/strong&gt; to protect confidential data from exfiltration when using Claude-powered agents in development environments.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://platform.claude.com/docs/en/agent-sdk/secure-deployment" rel="noopener noreferrer"&gt;Securely deploying AI agents&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Disable "Always Allow" for High-Risk Tools
&lt;/h3&gt;

&lt;p&gt;The recommendation to disable "Always Allow" and maintain a human-in-the-loop (HITL) for high-risk tools is a foundational security layer for AI agents. This strategy prevents &lt;strong&gt;"Zero-Click" or Cross-Prompt Injection (XPIA) attacks&lt;/strong&gt;, where a malicious instruction hidden in a file or website could trick an agent into executing a dangerous command without your intervention.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Anthropic (Claude Code &amp;amp; Cowork)
&lt;/h4&gt;

&lt;p&gt;Anthropic designed Claude Code with a "deliberately conservative" permission model. Their documentation explicitly advises against bypassing these prompts in local environments:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use the &lt;strong&gt;Default Mode&lt;/strong&gt; or &lt;strong&gt;Plan Mode&lt;/strong&gt;. The "Default" mode prompts for every shell command, while "Plan" mode prevents any execution at all.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;References: &lt;a href="https://support.claude.com/en/articles/13364135-use-cowork-safely" rel="noopener noreferrer"&gt;Use Cowork safely&lt;/a&gt;, &lt;a href="https://code.claude.com/docs/en/permissions" rel="noopener noreferrer"&gt;Claude Code: Configure Permissions &amp;amp; Modes&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Amazon Bedrock Agents
&lt;/h4&gt;

&lt;p&gt;AWS implements this via &lt;strong&gt;User Confirmation&lt;/strong&gt; and &lt;strong&gt;Return of Control (ROC)&lt;/strong&gt;. They frame it as a requirement for "High-Impact" actions.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For any tool that modifies data or accesses the network, AWS recommends enabling the "User Confirmation" flag in the Agent configuration. This pauses the agent and returns a structured prompt to the user.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://aws.amazon.com/blogs/machine-learning/implement-human-in-the-loop-confirmation-with-amazon-bedrock-agents/" rel="noopener noreferrer"&gt;Implement human-in-the-loop confirmation with Amazon Bedrock Agents&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Azure (AI Foundry &amp;amp; Defender for Cloud)
&lt;/h4&gt;

&lt;p&gt;Azure has recently integrated this into their security posture management. &lt;strong&gt;Microsoft Defender for Cloud&lt;/strong&gt; will actually flag an AI agent as "High Risk" if it has tool access without human-in-the-loop controls:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Azure recommends using &lt;strong&gt;Microsoft Entra Agent IDs&lt;/strong&gt; with scoped, short-lived tokens. They explicitly recommend "selective triggering" for risky operations.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;References: &lt;a href="https://learn.microsoft.com/en-us/azure/security/fundamentals/ai-security-best-practices" rel="noopener noreferrer"&gt;Azure AI security best practices&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/azure/defender-for-cloud/recommendations-reference-ai" rel="noopener noreferrer"&gt;AI security recommendations&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Google Cloud (Vertex AI Agent Builder)
&lt;/h4&gt;

&lt;p&gt;GCP focuses on "&lt;strong&gt;Confidence Thresholds&lt;/strong&gt;" and "&lt;strong&gt;Action Guardrails&lt;/strong&gt;" within its Agent Engine.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GCP recommends that any agent using the Model Context Protocol (MCP) or custom APIs should have a mandatory "Manual Review" step for any write operations.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://cloud.google.com/products/agent-builder" rel="noopener noreferrer"&gt;Vertex AI Agent Builder&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Scrub Untrusted Content
&lt;/h3&gt;

&lt;p&gt;Treating external content as an attack vector is essential for preventing &lt;strong&gt;Indirect Prompt Injection (XPIA)&lt;/strong&gt;, where malicious instructions are hidden in data (like a white-text command in a PDF) rather than the user's prompt.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Anthropic
&lt;/h4&gt;

&lt;p&gt;Anthropic explicitly identifies browser-based agents and document processing as the highest risk for injection. Their stance is that no model is 100% immune, so multi-layered defense is required:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic suggests using &lt;strong&gt;Claude Opus 4.5+&lt;/strong&gt; for untrusted tasks, as it has the highest benchmarked robustness against injection (reducing attack success to ~1%).
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;References: &lt;a href="https://www.anthropic.com/research/prompt-injection-defenses" rel="noopener noreferrer"&gt;Prompt Injection Defense&lt;/a&gt;, &lt;a href="https://support.claude.com/en/articles/12902428-using-claude-in-chrome-safely" rel="noopener noreferrer"&gt;Using Claude in Chrome Safely&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Amazon Bedrock Guardrails
&lt;/h4&gt;

&lt;p&gt;AWS addresses this by programmatically separating "Instructions" from "Data" so the model knows which one to ignore if they conflict:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Input Tagging&lt;/strong&gt; to wrap retrieved data (like a PDF's text) in XML tags. This allows Bedrock Guardrails to apply "Prompt Attack Filters" specifically to the data without blocking your system instructions.
&lt;/li&gt;
&lt;li&gt;AWS suggests a &lt;strong&gt;Lambda-based Pre-processing&lt;/strong&gt; step to scan PDFs for hidden text or PII before the text ever reaches the LLM.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;References: &lt;a href="https://aws.amazon.com/blogs/machine-learning/securing-amazon-bedrock-agents-a-guide-to-safeguarding-against-indirect-prompt-injections/" rel="noopener noreferrer"&gt;Securing Amazon Bedrock Agents&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-injection.html" rel="noopener noreferrer"&gt;Prompt injection security&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Azure (Prompt Shields and Spotlighting)
&lt;/h4&gt;

&lt;p&gt;Azure provides the most direct "Scrubbing" tool with a feature called &lt;strong&gt;Spotlighting&lt;/strong&gt;, which technically implements the "separate session" idea you mentioned.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable &lt;strong&gt;Prompt Shields for Documents&lt;/strong&gt;. This specifically detects "Document Attacks" where instructions are embedded in third-party content.
&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;spotlighting&lt;/strong&gt; to transform document content (sometimes via Base64 encoding), so the model treats it as "lower trust" grounded data, preventing it from being executed as a command.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;References: &lt;a href="https://learn.microsoft.com/en-us/azure/ai-services/content-safety/concepts/jailbreak-detection" rel="noopener noreferrer"&gt;Prompt Shields&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/azure/foundry/openai/concepts/content-filter-prompt-shields" rel="noopener noreferrer"&gt;Prompt Shields in Microsoft Foundry&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Google Cloud (Vertex AI Action Guardrails)
&lt;/h4&gt;

&lt;p&gt;GCP treats this through &lt;strong&gt;Content Filtering&lt;/strong&gt; and &lt;strong&gt;Manual Review&lt;/strong&gt; nodes in the agent's workflow:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GCP recommends "Gemini as a Filter." You use a smaller, faster model instance to "pre-read" and summarize a file in a low-privilege environment. If the summary contains instruction-like language (e.g., "ignore," "system," "delete"), the file is quarantined.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://docs.cloud.google.com/vertex-ai/generative-ai/docs/learn/safety-overview" rel="noopener noreferrer"&gt;Safety in Vertex AI&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Network Hardening
&lt;/h3&gt;

&lt;p&gt;"Network Hardening" isn't just about blocking ports; it’s about establishing a &lt;strong&gt;Zero Trust&lt;/strong&gt; egress policy for AI agents. Since Claude Desktop and Claude Code are effectively "execution engines" on your local machine, they require the same egress filtering you would apply to a production VPC.  &lt;/p&gt;

&lt;h4&gt;
  
  
  Anthropic
&lt;/h4&gt;

&lt;p&gt;Anthropic’s recent security documentation for &lt;strong&gt;Claude Code&lt;/strong&gt; and &lt;strong&gt;Desktop highlights&lt;/strong&gt; that "network isolation" is a core pillar of their sandboxing strategy:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use a Unix domain socket connected to a proxy server to enforce a "Deny All" outbound policy by default.
&lt;/li&gt;
&lt;li&gt;For local setups, Anthropic suggests customizing this proxy to enforce rules on outgoing traffic, allowing only trusted domains (like anthropic.com or your internal API endpoints).
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://www.anthropic.com/engineering/claude-code-sandboxing" rel="noopener noreferrer"&gt;Claude Code Sandboxing&lt;/a&gt;, &lt;a href="https://code.claude.com/docs/en/security#monitoring-usage" rel="noopener noreferrer"&gt;Auditing Network Activity&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  AWS
&lt;/h4&gt;

&lt;p&gt;AWS frames this as "&lt;strong&gt;Egress Filtering&lt;/strong&gt;" via the AWS Network Firewall. For an AI agent running in an AWS environment, the strategy is to block all traffic that isn't signed by a specific SNI (Server Name Indication):  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;AWS Network Firewall&lt;/strong&gt; with stateful rules to monitor the SNI of outbound HTTPS requests. If an agent tries to "phone home" to an unknown IP or a malicious C2 (Command &amp;amp; Control) server, the firewall drops the packet.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;References: &lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/secure-outbound-network-traffic/restricting-outbound-traffic.html" rel="noopener noreferrer"&gt;Restricting a VPC’s outbound traffic&lt;/a&gt;, &lt;a href="https://aws.amazon.com/blogs/security/build-secure-network-architectures-for-generative-ai-applications-using-aws-services/" rel="noopener noreferrer"&gt;Build secure network architectures for generative AI applications&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Azure
&lt;/h4&gt;

&lt;p&gt;Azure has introduced a specific feature called the &lt;strong&gt;Network Security Perimeter (NSP)&lt;/strong&gt; to create a logical boundary for AI services.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Even if an AI service has a public endpoint, the NSP acts as an "Application Firewall" that logs every access attempt and blocks exfiltration to any service outside that perimeter.
&lt;/li&gt;
&lt;li&gt;Configure &lt;strong&gt;Azure Firewall Application Rules&lt;/strong&gt; to allow only specific FQDNs (Fully Qualified Domain Names) required for your Claude-based workflows.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;References: &lt;a href="https://learn.microsoft.com/en-us/azure/foundry-classic/openai/how-to/network-security-perimeter" rel="noopener noreferrer"&gt;Add an AI Network Security Perimeter&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/azure/app-service/network-secure-outbound-traffic-azure-firewall" rel="noopener noreferrer"&gt;Control outbound traffic with Azure Firewall&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Google Cloud
&lt;/h4&gt;

&lt;p&gt;GCP’s approach is the most rigid, using &lt;strong&gt;VPC Service Controls&lt;/strong&gt; to prevent data exfiltration at the API layer, regardless of the network path:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrap your AI project in a "Service Perimeter." If an agent inside this perimeter tries to send data to a Cloud Storage bucket or an external API not explicitly in the "Ingress/Egress" rule set, the request is blocked by the Google front-end.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Reference: &lt;a href="https://docs.cloud.google.com/vpc-service-controls/docs/overview" rel="noopener noreferrer"&gt;Mitigating Data Exfiltration with VPC Service Controls&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Summary
&lt;/h3&gt;

&lt;p&gt;Claude Cowork marks a transition from AI that talks to AI that acts. By granting a digital agent direct access to your files and external apps via the Model Context Protocol, you gain a powerful "digital intern." However, this shifts the security focus from protecting a simple chat to securing a privileged system user capable of modifying data and executing commands.&lt;br&gt;&lt;br&gt;
To manage this risk, organizations must adopt a "Zero Trust" approach for agentic tasks. This means strictly isolating the agent's access to specific folders, requiring human approval for high-risk actions, and using cloud-native firewalls to prevent data exfiltration. By treating the AI as a high-risk user and enforcing strong monitoring, you can automate complex workflows without compromising your system's integrity.&lt;br&gt;&lt;br&gt;
Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.  &lt;/p&gt;

&lt;h4&gt;
  
  
  About the Author
&lt;/h4&gt;

&lt;p&gt;&lt;strong&gt;Eyal Estrin&lt;/strong&gt; is a cloud and information security architect and &lt;a href="https://builder.aws.com/community/@eyalestrin" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, with more than 25 years in the industry. He is the author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
The views expressed are his own.  &lt;/p&gt;

</description>
      <category>secops</category>
      <category>aws</category>
      <category>azure</category>
      <category>gcp</category>
    </item>
    <item>
      <title>Play 88EF Game – Fun, Rewards, and Exciting Challenges!</title>
      <dc:creator>Talha</dc:creator>
      <pubDate>Tue, 03 Mar 2026 20:05:30 +0000</pubDate>
      <link>https://community.ops.io/talha_ea18f1e1407fcb6553a/play-88ef-game-fun-rewards-and-exciting-challenges-3gk7</link>
      <guid>https://community.ops.io/talha_ea18f1e1407fcb6553a/play-88ef-game-fun-rewards-and-exciting-challenges-3gk7</guid>
      <description>&lt;p&gt;Looking for a game that’s both fun and rewarding? 88EF Game is the perfect choice! With stunning graphics, smooth gameplay, and plenty of bonuses, it keeps things exciting every time you play. &lt;a href="https://apkvila.com/88ef-game/" rel="noopener noreferrer"&gt;Best game&lt;br&gt;
&lt;/a&gt;&lt;br&gt;
Setting up an account is simple, and you can start earning rewards right away. Join other players and see why 88EF Game is gaining so much popularity in Pakistan. Don’t miss out on the action!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>AI vs. Engineering Teams</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Sun, 22 Feb 2026 16:08:11 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/ai-vs-engineering-teams-1o9j</link>
      <guid>https://community.ops.io/eyalestrin/ai-vs-engineering-teams-1o9j</guid>
      <description>&lt;p&gt;In February 2026, Anthropic released a new capability for Claude Code called &lt;a href="https://www.anthropic.com/news/claude-code-security" rel="noopener noreferrer"&gt;Claude Code Security&lt;/a&gt; - a new tool that thinks like a developer to find tricky logic errors in your code, ranking how risky they are and suggesting fixes you can review.&lt;br&gt;&lt;br&gt;
The news sent a shockwave through cybersecurity stocks, causing JFrog to crash by nearly 25% while others like CrowdStrike, Okta, and Cloudflare all saw their share prices tumble by around 8% or 9%.&lt;br&gt;&lt;br&gt;
The announcement raised a question: can AI tools replace the current SaaS or cybersecurity products, or can AI agents replace developers or engineering teams?&lt;br&gt;&lt;br&gt;
Anthropic’s Claude Code Security announcement highlights a move toward "agentic reasoning" - the ability for AI to understand complex data flows and logic flaws rather than just matching known patterns. While this is a significant leap for the "Defensive AI" movement, it does not signal the end of the human engineer or the mature SaaS platform.&lt;br&gt;&lt;br&gt;
In this blog post, I will share my point of view on the current advancement in AI technology.  &lt;/p&gt;

&lt;h2&gt;
  
  
  The Modern SDLC and CI/CD Pipeline
&lt;/h2&gt;

&lt;p&gt;The Software Development Life Cycle (SDLC) is a continuous loop. AI tools now act as "force multipliers" in these phases, but they lack the authority and context to own them.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Requirements and Planning
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Process&lt;/strong&gt;: Translating vague business needs into technical specifications.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI's Role&lt;/strong&gt;: Summarizing stakeholder meetings and drafting initial user stories.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Human Factor&lt;/strong&gt;: AI cannot negotiate trade-offs. It doesn't understand that a "must-have" feature might be delayed because of a pending merger or a team's current burnout level.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Architecture and Design
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Process&lt;/strong&gt;: Designing the blueprint for scalability and security across cloud providers like AWS, Azure, or GCP.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI's Role&lt;/strong&gt;: Suggesting common design patterns (e.g., Event-Driven vs. Microservices) and generating Infrastructure as Code (IaC).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Human Factor&lt;/strong&gt;: AI lacks "institutional memory." It doesn't know why a specific database was chosen three years ago to satisfy a unique compliance requirement that still exists.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Development and Implementation
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Process&lt;/strong&gt;: Writing and committing the actual code.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI's Role (Claude Code)&lt;/strong&gt;: This is where agentic tools live. They can read your files, run terminal commands, and fix bugs autonomously.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Human Factor&lt;/strong&gt;: Large codebases (50k+ lines) often exceed an AI's effective context window. As the context fills, the AI can introduce conflicting logic or "hallucinate" dependencies.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  CI/CD: Testing and Security
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Process&lt;/strong&gt;: Automating the path to production through integration and deployment pipelines.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI's Role (Claude Code Security)&lt;/strong&gt;: It identifies high-severity vulnerabilities (e.g., broken access control) and suggests a verified patch.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Human Factor&lt;/strong&gt;: Anthropic emphasizes a "Human-in-the-Loop" model. AI cannot take the legal or professional blame for a botched security patch that causes a global outage.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Observability and Maintenance
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Process&lt;/strong&gt;: Monitoring live systems and fixing production bugs at scale.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI's Role&lt;/strong&gt;: Analyzing logs to detect anomalies and suggesting fixes for "infrastructure drift."
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Human Factor&lt;/strong&gt;: Being on-call at 3:00 AM requires high-stakes decision-making and cross-team coordination that AI agents cannot yet replicate.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why GenAI Cannot Replace Experienced Engineers
&lt;/h2&gt;

&lt;p&gt;Even with the reasoning capabilities shown in the 2026 Claude Code Security update, three "hard barriers" prevent AI from replacing the individual contributor:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Responsibility Gap&lt;/strong&gt;: Software isn't just code; it's a liability. No AI subscription comes with an insurance policy. Accountability is a human-only function. If a system fails, a human must explain why to a board or a regulator.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reasoning vs. Intent&lt;/strong&gt;: AI understands the structure of your code, but humans understand the intent. An AI might see a missing role-check as a bug, while a human knows it was bypassed for a specific, documented emergency migration path.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technical Debt Acceleration&lt;/strong&gt;: Recent 2026 studies show that when developers over-rely on AI, "code churn" (code that is rewritten or deleted within two weeks) doubles. AI writes code faster than it can be reviewed, potentially creating a "spaghetti" codebase if not guided by a senior architect.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why AI Cannot Replace Mature SaaS Products
&lt;/h2&gt;

&lt;p&gt;Many feared that AI's ability to "generate a clone" of an app would kill the SaaS industry. This hasn't happened for several concrete reasons:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;SaaS is "Running," not "Building"&lt;/strong&gt;: Building a clone of Jira or Salesforce is the easy part. Operating it at 99.99\% availability, managing global data centers, and providing 24/7 support is what customers actually pay for.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance and Trust&lt;/strong&gt;: A mature SaaS product provides pre-built SOC2, GDPR, and HIPAA guardrails. An AI-generated app is a "black box" that hasn't been audited, making it a non-starter for enterprise or legal use.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The Integration Ecosystem&lt;/strong&gt;: SaaS platforms thrive on their ecosystems (APIs, plugins, and third-party integrations). AI can write a script to connect two tools, but it cannot manage the long-term versioning and stability of a multi-vendor tech stack.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;AI tools like Claude Code Security are the new "High-Level Languages" of 2026.&lt;br&gt;&lt;br&gt;
Just as C++ didn't kill programmers but made them more powerful, AI is shifting the engineer's role from "Coder" to "Orchestrator and Verifier."&lt;br&gt;&lt;br&gt;
Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.  &lt;/p&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;Eyal Estrin is a cloud and information security architect and &lt;a href="https://builder.aws.com/community/@eyalestrin" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, with more than 25 years in the industry. He is the author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
The views expressed are his own.  &lt;/p&gt;

</description>
      <category>security</category>
      <category>cicd</category>
      <category>automation</category>
      <category>cloudops</category>
    </item>
    <item>
      <title>Inside the Amazon Nova Forge</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Mon, 09 Feb 2026 13:41:45 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/inside-the-amazon-nova-forge-1f0o</link>
      <guid>https://community.ops.io/eyalestrin/inside-the-amazon-nova-forge-1f0o</guid>
      <description>&lt;p&gt;&lt;strong&gt;Amazon Nova Forge&lt;/strong&gt; is a development environment within &lt;strong&gt;Amazon SageMaker AI&lt;/strong&gt; dedicated to building "Novellas" - private, custom versions of Amazon’s Nova frontier models.&lt;br&gt;&lt;br&gt;
Unlike typical AI services that only allow you to use a model or fine-tune its final layer, Nova Forge introduces a concept called &lt;strong&gt;Open Training&lt;/strong&gt;. This gives you access to the model at various "life stages" (checkpoints), allowing you to bake your company’s proprietary knowledge directly into the model’s core reasoning capabilities.&lt;br&gt;&lt;br&gt;
This blog post is an introduction to Amazon Nova Forge and what makes it unique in the training process.  &lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes it Different?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://aws.amazon.com/what-is/prompt-engineering/" rel="noopener noreferrer"&gt;Prompt engineering&lt;/a&gt; and &lt;a href="https://aws.amazon.com/what-is/retrieval-augmented-generation/" rel="noopener noreferrer"&gt;RAG&lt;/a&gt; provide external context but fail to change a model's core intelligence. Standard fine-tuning also falls short because it happens too late in the lifecycle, attempting to steer a "finished" model that is already set in its ways. Nova Forge solves this by moving customization earlier into the training process, embedding specialized knowledge where it actually sticks.&lt;br&gt;&lt;br&gt;
Nova Forge occupies a unique middle ground between Managed APIs (Bedrock) and building from scratch.  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Amazon Bedrock&lt;/strong&gt;: Bedrock is for &lt;strong&gt;consuming&lt;/strong&gt; models. You can fine-tune them, but you are working on a "black box" model. Nova Forge is for &lt;strong&gt;building&lt;/strong&gt; the model itself using deeper training techniques.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Azure AI&lt;/strong&gt; / &lt;strong&gt;Google Vertex AI&lt;/strong&gt;: While Azure and GCP offer fine-tuning, they generally don't provide access to intermediate training checkpoints of their frontier models. Nova Forge allows for &lt;strong&gt;Data Blending&lt;/strong&gt;, where you mix your data with Amazon’s original training data to prevent the model from "forgetting" how to speak or reason.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Terminology
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Novella&lt;/strong&gt;: The resulting custom model you create. It’s a "private edition" of Nova.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Checkpoints&lt;/strong&gt;: Saved "states" of the model during its initial training (pre-training, mid-training, post-training).
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Blending&lt;/strong&gt;: The process of mixing your proprietary data with Nova-curated datasets so the model stays smart while learning your specific business.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reinforcement Fine-Tuning (RFT)&lt;/strong&gt;: Using "reward functions" (logic-based feedback) to teach the model how to perform complex, multi-step tasks correctly.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Catastrophic Forgetting&lt;/strong&gt;: A common AI failure where a model learns new information but loses its original abilities. Nova Forge is designed specifically to prevent this.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Workflow: From Training to Production
&lt;/h2&gt;

&lt;p&gt;The process bridges the gap between the "lab" (SageMaker) and the "app" (Bedrock).  &lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Selection&lt;/strong&gt;: You choose a Nova base model and a specific checkpoint (e.g., a "Mid-training" checkpoint) in &lt;strong&gt;Amazon SageMaker Studio&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Training (SageMaker AI)&lt;/strong&gt;: You use &lt;strong&gt;SageMaker Recipes&lt;/strong&gt;—pre-configured training scripts—to blend your data from S3 with Nova’s datasets. The heavy lifting (compute) happens on SageMaker's managed infrastructure.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refinement&lt;/strong&gt;: Optionally, you run RFT in SageMaker to align the model with specific business outcomes or safety guardrails.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deployment (Bedrock)&lt;/strong&gt;: Once the "Novella" is ready, you import it into Amazon Bedrock as a private model.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Production&lt;/strong&gt;: Your applications call the custom model via the standard Bedrock API, benefitting from Bedrock’s serverless scaling and security.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Below is a sample training workflow:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/ISNIl7xkj1pirRZpdH-Uf9_ZVFkwYjBVDcRVjCM45Gc/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzh1NnM3/MndkdHAzbmk2dDlq/aGFsLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/ISNIl7xkj1pirRZpdH-Uf9_ZVFkwYjBVDcRVjCM45Gc/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzh1NnM3/MndkdHAzbmk2dDlq/aGFsLnBuZw" alt=" " width="750" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Privacy and Protection
&lt;/h2&gt;

&lt;p&gt;The security model is the most critical part:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sovereignty&lt;/strong&gt;: Your data stays in your S3 buckets and within your VPC boundaries.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Leakage&lt;/strong&gt;: AWS explicitly states that customer data is not used to train the base Amazon Nova models. Your "Novella" is a private resource visible only to your AWS account.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Encryption&lt;/strong&gt;: Data is encrypted at rest via KMS (AWS-managed or Customer-managed keys) and in transit via TLS 1.2+.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Governance&lt;/strong&gt;: Access is controlled via standard IAM policies, and all training activity is logged in CloudTrail.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Pricing Model
&lt;/h2&gt;

&lt;p&gt;Nova Forge carries a distinct cost structure that reflects its "frontier" status:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Subscription Fee&lt;/strong&gt;: Access to the Forge environment starts at approximately --&lt;strong&gt;$100,000 per year&lt;/strong&gt;.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usage Costs&lt;/strong&gt;: On top of the subscription, you pay for the SageMaker compute (GPUs) used during the training phase.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Comparison&lt;/strong&gt;: &lt;strong&gt;Cheaper than Training from Scratch&lt;/strong&gt;: Building a frontier model from zero costs millions in compute and months of R&amp;amp;D. Nova Forge provides the "shortcuts" to get the same result for a fraction of that.

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;More Expensive than Basic Fine-Tuning&lt;/strong&gt;: Standard fine-tuning on Bedrock is much cheaper (often just a few dollars per hour), but it cannot achieve the deep "domain-native" intelligence that Nova Forge provides.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Amazon Nova Forge marks a shift from generic AI to &lt;strong&gt;native intelligence&lt;/strong&gt;, where models don't just reference your data—they are built from it. By using "Open Training," you can bake specialized knowledge into the model’s core at the pre-training or mid-training stages. This results in a private &lt;strong&gt;Novella&lt;/strong&gt; that understands your specific industry as naturally as its base language.&lt;br&gt;&lt;br&gt;
Organizations managing high-value proprietary data should consider moving beyond treating that information as an external reference. If your workflows involve specialized terminology or regulated processes that standard LLMs struggle to master, shifting customization earlier in the training lifecycle is often more effective than basic fine-tuning.&lt;br&gt;&lt;br&gt;
Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Additional references
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/sagemaker/latest/dg/nova-forge.html" rel="noopener noreferrer"&gt;Amazon Nova Forge&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/aws/introducing-amazon-nova-forge-build-your-own-frontier-models-using-nova/" rel="noopener noreferrer"&gt;Introducing Amazon Nova Forge: Build your own frontier models using Nova&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Eyal Estrin&lt;/strong&gt; is a cloud and information security architect and &lt;a href="https://builder.aws.com/community/@eyalestrin" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, with more than 25 years in the industry. He is the author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
The views expressed are his own.  &lt;/p&gt;

</description>
      <category>aws</category>
      <category>dataops</category>
    </item>
    <item>
      <title>ClawdBot Security Guide</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Mon, 02 Feb 2026 14:02:19 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/clawdbot-security-guide-9aa</link>
      <guid>https://community.ops.io/eyalestrin/clawdbot-security-guide-9aa</guid>
      <description>&lt;p&gt;Clawdbot (now renamed &lt;a href="https://www.molt.bot/" rel="noopener noreferrer"&gt;Moltbot&lt;/a&gt;) is an open-source, self-hosted AI assistant that runs on your own hardware or server and can-do things, not just chat.&lt;br&gt;&lt;br&gt;
It was created by developer &lt;a href="https://steipete.me/about" rel="noopener noreferrer"&gt;Peter Steinberger&lt;/a&gt; in late 2025.&lt;br&gt;&lt;br&gt;
It connects your AI model (OpenAI, Claude, local models via Ollama) to real capabilities: automate workflows, read/write files, execute tools and scripts, manage emails/calendars, and respond through messaging apps like WhatsApp, Telegram, Discord and Slack.&lt;br&gt;&lt;br&gt;
You interact with it like a smart assistant that actually takes action based on your input.  &lt;/p&gt;

&lt;h2&gt;
  
  
  What is it used for?
&lt;/h2&gt;

&lt;p&gt;Clawdbot functions as a "digital employee" or a "Jarvis-like" assistant that operates 24/7. Because it has direct access to your local filesystem and system tools, it can perform proactive tasks that standard AI cannot:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Communication Hub&lt;/strong&gt;: It lives inside messaging apps like Telegram, WhatsApp, or Slack. You text it commands, and it can reply, summarize threads, or manage your inbox.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proactive Automation&lt;/strong&gt;: It can monitor your email, calendar, and GitHub repositories to fix bugs while you sleep, draft replies, or alert you to flight check-ins.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System Execution&lt;/strong&gt;: It can run shell commands, execute scripts, manage files, and even control web browsers to perform actions like making purchases or reservations.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Persistent Memory&lt;/strong&gt;: It maintains long-term context across conversations, remembering your preferences and past tasks for weeks or months.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Below is a sample deployment architecture of Clawdbot:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/ohUsfFYk8KGECmeQkHoerB_mf-KIa_sB3_zQxsDgNL8/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzFnaHho/dzI2aDEzYnZ4MXpp/OG5rLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/ohUsfFYk8KGECmeQkHoerB_mf-KIa_sB3_zQxsDgNL8/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzFnaHho/dzI2aDEzYnZ4MXpp/OG5rLnBuZw" alt=" " width="750" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Security risks associated with Clawdbot
&lt;/h2&gt;

&lt;p&gt;Clawdbot is a high-privilege automation control plane. Since it manages agents, tools, and multiple communication channels, it presents serious security risks.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Control plane exposure &amp;amp; misconfiguration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exposure&lt;/strong&gt;: Misconfigured dashboards and reverse proxies have left hundreds of control interfaces open to the internet.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authentication Failures&lt;/strong&gt;: Some setups treat remote connections as local, letting attackers bypass authentication.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data Theft&lt;/strong&gt;: Unsecured instances can expose API keys, conversation logs, and configuration data.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System Takeover&lt;/strong&gt;: In certain cases, attackers can run commands on the host with elevated privileges.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Prompt injection &amp;amp; tool blast radius
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Manipulation&lt;/strong&gt;: Malicious or untrusted content can trick the AI into using tools in unintended ways.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blast Radius&lt;/strong&gt;: Access to high-privilege tools like shell commands or admin APIs means a prompt injection could lead to data theft or lateral movement across the network.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model Weakness&lt;/strong&gt;: Older or poorly aligned AI models are more likely to ignore safety instructions, increasing risk.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Social engineering and user level abuse
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deception&lt;/strong&gt;: Attackers can manipulate the bot to extract personal or environment-specific information.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Account Misuse&lt;/strong&gt;: Connected commerce tools could be used for unauthorized purchases.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Phishing&lt;/strong&gt;: A compromised bot can send malicious links or scripts to contacts.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upstream Data Exposure&lt;/strong&gt;: Prompts and tool outputs sent to AI providers can create privacy or compliance issues if not carefully managed.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Data privacy, logs, and long term memory
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Sensitive Data Exposure&lt;/strong&gt;: The gateway stores conversation histories and memory, which may include personal or business information depending on usage.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard and Host Vulnerabilities&lt;/strong&gt;: Exposed dashboards or weak host protections can allow attackers to access past chats, file transfers, and stored credentials (API keys, tokens, OAuth secrets), turning the instance into a data exfiltration point.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upstream Data Risk&lt;/strong&gt;: Prompts and tool outputs are sent to AI providers. Without proper scoping and data classification, this can create privacy and compliance issues.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ecosystem risks: hijacked branding, fake installers, and scams
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hijacked Accounts&lt;/strong&gt;: After a rebrand, original social media and GitHub handles were exploited by scammers promoting fake crypto tokens.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Malware Risk&lt;/strong&gt;: Users searching for the tool may encounter backdoored versions or fake installers designed to compromise their systems.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Network and Remote Access Risks
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browser Control&lt;/strong&gt;: Tools that let the bot control a browser can expose local or internal network resources if not secured.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tunneling Errors&lt;/strong&gt;: Misconfigured reverse proxies or tools like Tailscale may grant attackers unintended access to private networks.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Recommendations for securing Clawdbot
&lt;/h2&gt;

&lt;p&gt;Based on the official GitHub repository, documentation, and expert audits from January 2026, here are the recommendations for securing your instance.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Lock Down the Gateway
&lt;/h3&gt;

&lt;p&gt;Bind the Clawdbot gateway to loopback (127.0.0.1) and never expose it directly to the internet. If remote access is required, use private mesh solutions such as Tailscale or Cloudflare Tunnel. Always enable gateway authentication using tokens or passwords.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/moltbot/moltbot/security" rel="noopener noreferrer"&gt;Official GitHub Security Overview&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.clawd.bot/" rel="noopener noreferrer"&gt;Clawdbot Remote Access Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Enforce Strict Access Controls
&lt;/h3&gt;

&lt;p&gt;Restrict who can interact with Clawdbot by enforcing DM pairing or allowlists. Avoid wildcard policies in production. In group chats, require explicit mentions before the bot processes messages.&lt;br&gt;&lt;br&gt;
Reference:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/clawdbot/clawdbot/blob/main/SECURITY.md" rel="noopener noreferrer"&gt;Official GitHub SECURITY.md&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Isolate the Runtime Environment
&lt;/h3&gt;

&lt;p&gt;Run Clawdbot on dedicated hardware or a dedicated VM/container. Avoid running it on your primary workstation. Use Docker sandboxing with minimal mounts and dropped capabilities.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://www.google.com/search?q=https://docs.clawd.bot/getting-started" rel="noopener noreferrer"&gt;Clawdbot Getting Started Guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/clawdbot/clawdbot/security" rel="noopener noreferrer"&gt;Official GitHub Security Overview&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Sandbox and Restrict Tools
&lt;/h3&gt;

&lt;p&gt;Enable sandboxing for all high-risk tools such as exec, write, browser automation, and web access. Use tool allow/deny lists and restrict elevated tools to trusted users only.&lt;br&gt;&lt;br&gt;
Reference:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/moltbot/moltbot/security" rel="noopener noreferrer"&gt;Official GitHub Security Overview&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Apply Least Privilege to Agent Capabilities
&lt;/h3&gt;

&lt;p&gt;Disable interactive shells unless strictly necessary. Limit filesystem visibility to read-only mounts where possible. Avoid granting elevated privileges to agents handling untrusted input.&lt;br&gt;&lt;br&gt;
Reference:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.molt.bot/" rel="noopener noreferrer"&gt;Official Clawdbot Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secure Credentials and Secrets
&lt;/h3&gt;

&lt;p&gt;Store secrets in environment variables, not configuration files or source control. Apply strict filesystem permissions to Clawdbot directories and rotate credentials after any suspected incident.&lt;br&gt;&lt;br&gt;
Reference:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/moltbot/moltbot/security" rel="noopener noreferrer"&gt;Official Clawdbot Security Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Continuous Auditing and Monitoring
&lt;/h3&gt;

&lt;p&gt;Regularly run built-in security audit and doctor commands to detect unsafe configurations. Monitor logs and session transcripts for anomalous behavior or unexpected access.&lt;br&gt;&lt;br&gt;
Reference:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/clawdbot/clawdbot/security" rel="noopener noreferrer"&gt;Official GitHub Security CLI Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Harden Browser Automation
&lt;/h3&gt;

&lt;p&gt;Treat browser automation as operator-level access. Use dedicated browser profiles without password managers or sync enabled. Never expose browser control ports publicly.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Prompt-Level Safety Rules
&lt;/h3&gt;

&lt;p&gt;Define explicit system rules that prevent disclosure of credentials, filesystem structure, or infrastructure details. Require confirmation for destructive actions.&lt;br&gt;&lt;br&gt;
Reference:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/moltbot/moltbot/security" rel="noopener noreferrer"&gt;Official Clawdbot Security Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Incident Response Preparedness
&lt;/h3&gt;

&lt;p&gt;Maintain a documented response plan. If compromise is suspected: stop the gateway, revoke access, rotate all secrets, review logs, and re-run security audits.&lt;br&gt;&lt;br&gt;
Reference:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/moltbot/moltbot/security" rel="noopener noreferrer"&gt;Official Clawdbot Security Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;ClawdBot is a high-privilege AI agent that can act on your system, not just chat. Its main risks come from exposed gateways, weak access controls, and powerful tools combined with prompt injection or social engineering, which can lead to system compromise and data loss. To use it safely, lock the gateway to localhost with authentication, restrict who can interact with it, isolate its runtime, minimize tool permissions, and monitor it continuously.&lt;br&gt;&lt;br&gt;
Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://snyk.io/articles/clawdbot-ai-assistant/" rel="noopener noreferrer"&gt;Your Clawdbot AI Assistant Has Shell Access and One Prompt Injection Away from Disaster&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://lukasniessen.medium.com/clawdbot-setup-guide-how-to-not-get-hacked-63bc951cbd90" rel="noopener noreferrer"&gt;ClawdBot: Setup Guide + How to NOT Get Hacked&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://x.com/mrnacknack/status/2016134416897360212" rel="noopener noreferrer"&gt;10 ways to hack into a vibecoder's clawdbot &amp;amp; get entire human identity (educational purposes only)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/pulse/hacking-clawdbot-eating-lobster-souls-jamieson-o-reilly-whhlc/" rel="noopener noreferrer"&gt;Hacking clawdbot and eating lobster souls&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/pulse/hackedin-eating-lobster-souls-part-ii-supply-chain-aka-o-reilly-lbaac/" rel="noopener noreferrer"&gt;Eating lobster souls Part II: the supply chain&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.linkedin.com/pulse/hackedin-eating-lobster-souls-part-iii-finale-escape-moltrix-gsamc/" rel="noopener noreferrer"&gt;Eating lobster souls Part III (the finale): Escape the Moltrix&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Eyal Estrin&lt;/strong&gt; is a cloud and information security architect and &lt;a href="https://builder.aws.com/community/@eyalestrin" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, with more than 25 years in the industry. He is the author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
The views expressed are his own.  &lt;/p&gt;

</description>
      <category>cloudops</category>
      <category>secops</category>
    </item>
    <item>
      <title>Food Flavors 2026: Viral Recipes &amp; Zero-Waste Cooking</title>
      <dc:creator>usatrending todays</dc:creator>
      <pubDate>Fri, 30 Jan 2026 10:17:07 +0000</pubDate>
      <link>https://community.ops.io/usatrending55/food-flavors-2026-viral-recipes-zero-waste-cooking-2kg</link>
      <guid>https://community.ops.io/usatrending55/food-flavors-2026-viral-recipes-zero-waste-cooking-2kg</guid>
      <description>&lt;p&gt;Food Trends and Flavors to Explore: Insights from Usatrendingtodays&lt;/p&gt;

&lt;p&gt;Food is much more than a daily necessity—it is culture, comfort, creativity, and connection. Across the world, food brings people together, tells stories, and reflects traditions passed down through generations. In recent years, the way people think about food has changed significantly. From healthy eating to global flavors and sustainable choices, food trends continue to evolve. Platforms like &lt;a href="http://usatrendingtodays.com/" rel="noopener noreferrer"&gt;usatrendingtodays&lt;/a&gt; help readers stay informed about what’s popular, nutritious, and exciting in the world of food.&lt;/p&gt;

&lt;p&gt;The Importance of Food in Everyday Life&lt;/p&gt;

&lt;p&gt;Food plays a central role in our daily routines. It fuels our bodies, supports our health, and influences our mood and energy levels. Beyond nutrition, food also creates emotional connections—family meals, celebrations, and shared experiences often revolve around food.&lt;/p&gt;

&lt;p&gt;According to insights shared on usatrendingtodays, people today are more conscious about what they eat. They want food that is not only tasty but also healthy, ethically sourced, and prepared with care. This growing awareness has reshaped food choices around the world.&lt;/p&gt;

&lt;p&gt;Global Food Culture and Diversity&lt;/p&gt;

&lt;p&gt;One of the most beautiful aspects of food is its diversity. Every culture has its own unique flavors, ingredients, and cooking methods. Italian pasta, Indian curries, Japanese sushi, Mexican tacos, and Middle Eastern kebabs all represent the traditions and lifestyles of their regions.&lt;/p&gt;

&lt;p&gt;Usatrendingtodays often highlights global food trends, encouraging people to explore international cuisines. Thanks to globalization and social media, trying foods from different cultures has become easier than ever, even without traveling far from home.&lt;/p&gt;

&lt;p&gt;Healthy Eating and Nutrition Trends&lt;/p&gt;

&lt;p&gt;Healthy eating has become a major focus in modern lifestyles. People are paying closer attention to ingredients, portion sizes, and nutritional value. Diets rich in fruits, vegetables, whole grains, and lean proteins are widely encouraged.&lt;/p&gt;

&lt;p&gt;Platforms like usatrendingtodays discuss popular nutrition trends such as plant-based diets, gluten-free options, low-sugar meals, and balanced eating habits. These trends help people make informed decisions that support long-term health without sacrificing flavor.&lt;/p&gt;

&lt;p&gt;The Rise of Home Cooking&lt;/p&gt;

&lt;p&gt;Home cooking has seen a strong comeback in recent years. Many people now prefer preparing meals at home to control ingredients, save money, and enjoy fresh food. Cooking at home also allows creativity and experimentation with recipes.&lt;/p&gt;

&lt;p&gt;Usatrendingtodays shares easy and practical cooking ideas that suit busy lifestyles. From quick weekday meals to special weekend recipes, home cooking encourages healthier habits and strengthens family bonds through shared meals.&lt;/p&gt;

&lt;p&gt;Street Food and Casual Dining&lt;/p&gt;

&lt;p&gt;Street food is loved worldwide for its bold flavors, affordability, and cultural authenticity. From food trucks to local markets, street food offers a taste of tradition in every bite. It reflects local ingredients and cooking styles while being accessible to everyone.&lt;/p&gt;

&lt;p&gt;According to usatrendingtodays, street food trends are gaining global popularity. Many street food dishes have inspired restaurant menus, blending casual dining with gourmet creativity.&lt;/p&gt;

&lt;p&gt;Sustainable and Ethical Food Choices&lt;/p&gt;

&lt;p&gt;Sustainability is becoming an important part of food culture. People are more aware of how food production affects the environment. Reducing food waste, choosing locally sourced ingredients, and supporting ethical farming practices are now key concerns.&lt;/p&gt;

&lt;p&gt;Usatrendingtodays highlights sustainable food movements that promote eco-friendly choices. Conscious eating not only benefits personal health but also contributes to a healthier planet.&lt;/p&gt;

&lt;p&gt;Food and Technology&lt;/p&gt;

&lt;p&gt;Technology has transformed the food industry in many ways. Online food delivery apps, digital recipes, smart kitchen appliances, and virtual cooking classes have made food more accessible and convenient.&lt;/p&gt;

&lt;p&gt;Platforms like usatrendingtodays explore how technology is shaping food habits. From discovering new restaurants to learning cooking techniques online, technology continues to enhance the way people experience food.&lt;/p&gt;

&lt;p&gt;Comfort Food and Emotional Connection&lt;/p&gt;

&lt;p&gt;Comfort food holds a special place in people’s hearts. These are the meals that bring warmth, nostalgia, and a sense of happiness. Comfort food varies across cultures but often includes simple, familiar dishes that remind people of home.&lt;/p&gt;

&lt;p&gt;Usatrendingtodays notes that comfort food remains popular, especially during stressful times. While modern trends come and go, classic comfort meals continue to provide emotional satisfaction and balance.&lt;/p&gt;

&lt;p&gt;Food for Special Occasions&lt;/p&gt;

&lt;p&gt;Food is an essential part of celebrations and traditions. Festivals, weddings, holidays, and family gatherings are often centered around special dishes. These meals carry cultural meaning and create lasting memories.&lt;/p&gt;

&lt;p&gt;Through usatrendingtodays, readers can explore how different cultures celebrate with food. Understanding these traditions helps people appreciate the deeper significance behind recipes and culinary customs.&lt;/p&gt;

&lt;p&gt;Food Blogging and Social Media Influence&lt;/p&gt;

&lt;p&gt;Social media has changed the way people discover and share food. Food blogs, recipe videos, and restaurant reviews influence what people eat and where they dine. Visual platforms have made food presentation just as important as taste.&lt;/p&gt;

&lt;p&gt;Usatrendingtodays keeps track of trending food content and viral recipes, helping readers stay updated with what’s popular online. Social media continues to shape food culture in creative and exciting ways.&lt;/p&gt;

&lt;p&gt;The Future of Food&lt;/p&gt;

&lt;p&gt;The future of food is focused on innovation, health, and sustainability. Plant-based alternatives, lab-grown meat, organic farming, and personalized nutrition are gaining attention. People are looking for food that aligns with both their health goals and ethical values.&lt;/p&gt;

&lt;p&gt;As highlighted on usatrendingtodays, the food industry will continue to evolve to meet changing consumer demands. Technology, awareness, and creativity will play a major role in shaping what we eat in the years to come.&lt;/p&gt;

&lt;p&gt;Conclusion&lt;/p&gt;

&lt;p&gt;Food is a powerful part of human life, connecting people across cultures, generations, and experiences. From everyday meals to global cuisines and emerging trends, food continues to evolve with society. Staying informed helps people make better choices and enjoy food more mindfully.&lt;/p&gt;

&lt;p&gt;Platforms like usatrendingtodays offer valuable insights into food trends, healthy habits, and cultural influences. By exploring new flavors, supporting sustainable practices, and appreciating the role of food in daily life, individuals can turn every meal into a meaningful experience. Food is not just about eating—it’s about enjoyment, connection, and celebrating life itself.&lt;/p&gt;

</description>
      <category>foodies</category>
      <category>yummy</category>
      <category>tasty</category>
      <category>treanding</category>
    </item>
    <item>
      <title>Securing AI Skills</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Mon, 26 Jan 2026 15:11:51 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/securing-ai-skills-2aj8</link>
      <guid>https://community.ops.io/eyalestrin/securing-ai-skills-2aj8</guid>
      <description>&lt;p&gt;If you give an AI system the ability to act, you give it risk.&lt;br&gt;&lt;br&gt;
In earlier posts, I covered how to secure &lt;a href="https://medium.com/aws-in-plain-english/securing-mcp-servers-4a1872b530cf" rel="noopener noreferrer"&gt;MCP servers&lt;/a&gt; and &lt;a href="https://medium.com/aws-in-plain-english/securing-agentic-ai-systems-a04804eb0b01" rel="noopener noreferrer"&gt;agentic AI systems&lt;/a&gt;. This post focuses on a narrower but more dangerous layer: AI skills. These are the tools that let models touch the real world.&lt;br&gt;&lt;br&gt;
Once a model can call an API, run code, or move data, it stops being just a reasoning engine. It becomes an operator.&lt;br&gt;&lt;br&gt;
That is where most security failures happen.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Terminology
&lt;/h2&gt;

&lt;p&gt;In generative AI, "skills" describe the interfaces that allow a model to perform actions outside its own context.&lt;br&gt;&lt;br&gt;
Different vendors use different names:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tools&lt;/strong&gt;: Function calling and MCP-based interactions
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Plugins&lt;/strong&gt;: Web-based extensions used by chatbots
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actions&lt;/strong&gt;: OpenAI GPT Actions and AWS Bedrock Action Groups
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents&lt;/strong&gt;: Systems that reason and execute across multiple steps
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A base LLM predicts text; A skill gives it hands.&lt;br&gt;&lt;br&gt;
Skills are pre-defined interfaces that expose code, APIs, or workflows. When a model decides that text alone is not enough, it triggers a skill.&lt;br&gt;&lt;br&gt;
Anthropic treats skills as instruction-and-script bundles loaded at runtime.&lt;br&gt;&lt;br&gt;
OpenAI uses modular functions inside Custom GPTs and agents.&lt;br&gt;&lt;br&gt;
AWS implements the same idea through Action Groups.&lt;br&gt;&lt;br&gt;
Microsoft applies the term across Copilot and Semantic Kernel.&lt;br&gt;&lt;br&gt;
NVIDIA uses skills in its digital human platforms.&lt;br&gt;&lt;br&gt;
In the reference high-level architecture below, we can see the relations between the components:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/PZWV37tWZPov5YsT6DBuRSEmvmryOipZPIcW-MMZW24/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL2xwMjk2/Nml0ajF4bndrNGdu/N3hkLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/PZWV37tWZPov5YsT6DBuRSEmvmryOipZPIcW-MMZW24/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL2xwMjk2/Nml0ajF4bndrNGdu/N3hkLnBuZw" alt=" " width="750" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Skills Are Dangerous
&lt;/h2&gt;

&lt;p&gt;Every skill expands the attack surface. The model sits in the middle, deciding what to call and when. If it is tricked, the skill executes anyway.&lt;br&gt;&lt;br&gt;
The most common failure modes:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Excessive agency&lt;/strong&gt;: Skills often have broader permissions than they need. A file-management skill with system-level access is a breach waiting to happen.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The consent gap&lt;/strong&gt;: Users approve skills as a bundle. They rarely inspect the exact permissions. Attackers hide destructive capability inside tools that appear harmless.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Procedural and memory poisoning&lt;/strong&gt;: Skills that retain instructions or memory can be slowly corrupted. This does not cause an immediate failure. It changes behavior over time.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privilege escalation through tool chaining&lt;/strong&gt;: Multiple tools can be combined to bypass intended boundaries. A harmless read operation becomes a write. A write becomes execution.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Indirect prompt injection&lt;/strong&gt;: Malicious instructions are placed in content that the model reads: emails, web pages, documents. The model follows them using its own skills.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data exfiltration&lt;/strong&gt;: Skills often require access to sensitive systems. Once compromised, they can leak source code, credentials, or internal records.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supply chain risk&lt;/strong&gt;: Skills rely on third-party APIs and libraries. A poisoned update propagates instantly.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent-to-agent spread&lt;/strong&gt;: In multi-agent systems, one compromised skill can affect others. Failures cascade.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unsafe execution and RCE&lt;/strong&gt;: Any skill that runs code without isolation is exposed to remote code execution.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Insecure output handling&lt;/strong&gt;: Raw outputs passed directly to users can cause data leaks or client-side exploits.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSRF&lt;/strong&gt;: Fetch-style skills can be abused to probe internal networks.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How to Secure Skills (What Actually Works)
&lt;/h2&gt;

&lt;p&gt;Treat skills like production services. Because they are.  &lt;/p&gt;

&lt;h3&gt;
  
  
  Identity and Access Management
&lt;/h3&gt;

&lt;p&gt;Each skill must have its own identity. No shared credentials. No broad roles.&lt;br&gt;&lt;br&gt;
Permissions should be minimal and continuously evaluated. This directly addresses OWASP LLM06: Excessive Agency.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://genai.owasp.org/llmrisk/llm062025-excessive-agency/" rel="noopener noreferrer"&gt;OWASP LLM06:2025 Excessive Agency&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  AWS Bedrock
&lt;/h4&gt;

&lt;p&gt;Assign granular IAM roles per agent. Restrict regions and models with SCPs. Limit Action Groups to specific Lambda functions.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/prescriptive-guidance/latest/strategy-enterprise-ready-gen-ai-platform/security.html" rel="noopener noreferrer"&gt;Security and governance for generative AI platforms on AWS&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/code-interpreter-tool.html" rel="noopener noreferrer"&gt;Execute code and analyze data using Amazon Bedrock AgentCore Code Interpreter&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Microsoft Foundry
&lt;/h4&gt;

&lt;p&gt;Disable key-based auth. Use Entra ID and Managed Identities. Restrict connectors at the agent level.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ai-agents/governance-security-across-organization" rel="noopener noreferrer"&gt;Governance and security for AI agents across the organization&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/copilot/microsoft-365/microsoft-365-copilot-privacy" rel="noopener noreferrer"&gt;Data, Privacy, and Security for Microsoft 365 Copilot&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Google Vertex AI
&lt;/h4&gt;

&lt;p&gt;Use Workload Identity Federation. Scope permissions explicitly in agent configs.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://cloud.google.com/security/securing-ai?hl=en" rel="noopener noreferrer"&gt;Secure your Agentic and Generative AI with Google Cloud&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  OpenAI
&lt;/h4&gt;

&lt;p&gt;Never expose API keys client-side. Use project-scoped keys and backend proxies.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety" rel="noopener noreferrer"&gt;Best Practices for API Key Safety&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Input and Output Guardrails
&lt;/h3&gt;

&lt;p&gt;Prompt injection is not theoretical. It is the default attack.&lt;br&gt;&lt;br&gt;
Map OWASP LLM risks directly to controls.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://owasp.org/www-project-top-10-for-large-language-model-applications/" rel="noopener noreferrer"&gt;OWASP Top 10 for Large Language Model Applications&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  AWS Bedrock
&lt;/h4&gt;

&lt;p&gt;Use Guardrails with prompt-attack detection and PII redaction.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://aws.amazon.com/bedrock/guardrails/" rel="noopener noreferrer"&gt;Amazon Bedrock Guardrails&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Microsoft Foundry
&lt;/h4&gt;

&lt;p&gt;Enable Prompt Shields and groundedness detection.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety" rel="noopener noreferrer"&gt;Azure AI Content Safety&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Google Vertex AI
&lt;/h4&gt;

&lt;p&gt;Use Model Armor and safety filters at the API layer.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://docs.cloud.google.com/model-armor/overview" rel="noopener noreferrer"&gt;Model Armor overview&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  OpenAI
&lt;/h4&gt;

&lt;p&gt;Use zero-retention mode for sensitive workflows.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://platform.openai.com/docs/guides/your-data" rel="noopener noreferrer"&gt;Data controls in the OpenAI platform&lt;/a&gt;  &lt;/p&gt;

&lt;h4&gt;
  
  
  Anthropic
&lt;/h4&gt;

&lt;p&gt;Use constitutional prompts, but still enforce external moderation.&lt;br&gt;&lt;br&gt;
Reference: &lt;a href="https://www.anthropic.com/news/building-safeguards-for-claude" rel="noopener noreferrer"&gt;Building safeguards for Claude&lt;/a&gt;  &lt;/p&gt;

&lt;h3&gt;
  
  
  Adversarial Testing
&lt;/h3&gt;

&lt;p&gt;Red-team your agents.&lt;br&gt;&lt;br&gt;
Test prompt injection, RAG abuse, tool chaining, and data poisoning during development. Not after launch.&lt;br&gt;&lt;br&gt;
Threat modeling frameworks from OWASP, NIST, and Google apply here with minimal adaptation.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/whitepapers/latest/navigating-security-landscape-genai/threat-modeling-for-generative-ai-applications.html" rel="noopener noreferrer"&gt;Threat modeling for generative AI applications&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/security/engineering/threat-modeling-aiml" rel="noopener noreferrer"&gt;Threat Modeling AI/ML Systems and Dependencies&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://cloud.google.com/transform/how-google-does-it-threat-modeling-from-basics-to-ai/" rel="noopener noreferrer"&gt;How Google Does It: Threat modeling, from basics to AI&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://cloudsecurityalliance.org/artifacts/ai-model-risk-management-framework" rel="noopener noreferrer"&gt;AI Model Risk Management Framework&lt;/a&gt;  &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  DevSecOps Integration
&lt;/h3&gt;

&lt;p&gt;Every endpoint a skill calls is part of your attack surface.&lt;br&gt;&lt;br&gt;
Run SAST and DAST on the skill code. Scan dependencies. Fail builds when violations appear.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/devops/using-generative-ai-amazon-bedrock-and-amazon-codeguru-to-improve-code-quality-and-security/" rel="noopener noreferrer"&gt;Using Generative AI, Amazon Bedrock, and Amazon CodeGuru to Improve Code Quality and Security&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/security/benchmark/azure/mcsb-v2-artificial-intelligence-security" rel="noopener noreferrer"&gt;Artificial Intelligence Security&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Isolation and Network Controls
&lt;/h3&gt;

&lt;p&gt;Code-executing skills must run in ephemeral, sandboxed environments.&lt;br&gt;&lt;br&gt;
No host access. No unrestricted outbound traffic.&lt;br&gt;&lt;br&gt;
Use private networking wherever possible:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/vpc-interface-endpoints.html" rel="noopener noreferrer"&gt;AWS PrivateLink&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/configure-private-link" rel="noopener noreferrer"&gt;Azure Private Link and VNETs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cloud.google.com/vertex-ai/docs/general/vpc-service-controls" rel="noopener noreferrer"&gt;GCP VPC Service Controls&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Logging, Monitoring, and Privacy
&lt;/h3&gt;

&lt;p&gt;If you cannot audit skill usage, you cannot secure it.&lt;br&gt;&lt;br&gt;
Enable full invocation logging and integrate with existing SIEM tools.&lt;br&gt;&lt;br&gt;
Ensure provider data-handling terms match your risk profile. Not all plans are equal.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/bedrock/latest/userguide/logging-using-cloudtrail.html" rel="noopener noreferrer"&gt;Monitor Amazon Bedrock API calls using CloudTrail&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/azure/defender-for-cloud/recommendations-reference-ai" rel="noopener noreferrer"&gt;AI security recommendations&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cloud.google.com/vertex-ai/docs/general/audit-logging" rel="noopener noreferrer"&gt;Vertex AI audit logging information&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://platform.openai.com/docs/api-reference/audit-logs" rel="noopener noreferrer"&gt;OpenAI Audit Logs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview#key-security-considerations" rel="noopener noreferrer"&gt;Claude Agent Skills - Security Considerations&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Incident Response and Human Oversight
&lt;/h3&gt;

&lt;p&gt;Update incident response plans to include AI-specific failures.&lt;br&gt;&lt;br&gt;
For high-risk actions, require human approval. This is the simplest and most reliable control against runaway agents.&lt;br&gt;&lt;br&gt;
References:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://docs.aws.amazon.com/security-ir/latest/userguide/understand-threat-landscape.html" rel="noopener noreferrer"&gt;Understand the threat landscape&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://learn.microsoft.com/en-us/security/benchmark/azure/mcsb-v2-incident-response" rel="noopener noreferrer"&gt;Microsoft Cloud Security Benchmark v2 - Incident Response&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/machine-learning/implement-human-in-the-loop-confirmation-with-amazon-bedrock-agents/" rel="noopener noreferrer"&gt;Implement human-in-the-loop confirmation with Amazon Bedrock Agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/multi-agent-workflow-with-human-approval-using-agent-framework/4465927" rel="noopener noreferrer"&gt;Multi-agent Workflow with Human Approval using Agent Framework&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/discover/human-in-the-loop?hl=en" rel="noopener noreferrer"&gt;What is Human-in-the-Loop (HITL) in AI &amp;amp; ML?&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://platform.openai.com/docs/guides/safety-best-practices" rel="noopener noreferrer"&gt;OpenAI Safety best practices&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;AI skills are the execution layer of generative systems. They turn models from advisors into actors.&lt;br&gt;&lt;br&gt;
That shift introduces real security risk: excessive permissions, prompt injection, data leakage, and cascading agent failures.&lt;br&gt;&lt;br&gt;
Secure skills the same way you secure production services. Strong identity. Least privilege. Isolation. Guardrails. Monitoring. Human oversight.&lt;br&gt;&lt;br&gt;
There is no final state. Platforms change. Attacks evolve. Continuous testing is the job.  &lt;/p&gt;

&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;Eyal Estrin is a cloud and information security architect and &lt;a href="https://builder.aws.com/community/@eyalestrin" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, with more than 25 years in the industry. He is the author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;.&lt;br&gt;&lt;br&gt;
The views expressed are his own.  &lt;/p&gt;

</description>
      <category>aws</category>
      <category>azure</category>
      <category>gcp</category>
      <category>security</category>
    </item>
    <item>
      <title>Introducing Managed Instances in the Cloud</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Tue, 20 Jan 2026 14:12:39 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/introducing-managed-instances-in-the-cloud-5gj9</link>
      <guid>https://community.ops.io/eyalestrin/introducing-managed-instances-in-the-cloud-5gj9</guid>
      <description>&lt;p&gt;For many years, organizations embracing the public cloud knew there were two main types of compute services  - customer-managed (i.e., IaaS) and fully managed or Serverless compute (i.e., PaaS).&lt;br&gt;&lt;br&gt;
The main difference is who is responsible for maintenance of the underlying compute nodes in terms of OS maintenance (such as patch management, hardening, monitoring, etc.) and the scale (adding or removing compute nodes according to customer or application load).&lt;br&gt;&lt;br&gt;
In an ideal world, we would prefer a fully managed (or perhaps a Serverless) solution, but there are use cases where we would like to have the ability to manage a VM (such as the need to connect to a VM via SSH to make configuration changes at the OS level).&lt;br&gt;&lt;br&gt;
In this blog post, I will review several examples of managed instance services and compare their capabilities with the fully managed alternative.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Function as a Service
&lt;/h2&gt;

&lt;p&gt;The only alternative I managed to find is the AWS Lambda Managed Instances.&lt;br&gt;&lt;br&gt;
AWS Lambda has been in the market for many years, and it is the most common Serverless compute service in the public cloud (though not the only alternative).&lt;br&gt;&lt;br&gt;
Below is a comparison between AWS Lambda and the AWS Lambda Managed Instances:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/225VKkNVt0Vmk1IwaIrIKBAthaMnZgSjx_ogL_jNJf4/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3RzYWJk/dXRjNWN0YmVsb3Q2/ZGVtLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/225VKkNVt0Vmk1IwaIrIKBAthaMnZgSjx_ogL_jNJf4/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3RzYWJk/dXRjNWN0YmVsb3Q2/ZGVtLnBuZw" alt=" " width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use Which Alternative
&lt;/h3&gt;

&lt;p&gt;Use AWS Lambda (Standard) If:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Traffic is Bursty or Unpredictable&lt;/strong&gt;: You need the ability to scale from zero to thousands of concurrent executions in seconds to handle sudden spikes.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low or Intermittent Volume&lt;/strong&gt;: You have idle periods were paying for running instances would be wasteful. "Scale to zero" is a priority.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Strict Isolation is Required&lt;/strong&gt;: Your security model relies on the strong isolation of Firecracker microVMs for every single request.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplicity is Key&lt;/strong&gt;: You want zero infrastructure decisions—just upload code and run.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use AWS Lambda Managed Instances If:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Traffic is High &amp;amp; Predictable&lt;/strong&gt;: You have steady-state workloads were paying for always-on EC2 instances (with Savings Plans) is cheaper than per-request billing.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workloads are Compute/Memory Intensive&lt;/strong&gt;: You need specific hardware ratios (e.g., high CPU but low RAM) or specialized instruction sets not available in standard Lambda.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency Sensitivity&lt;/strong&gt;: You cannot afford any cold start latency and need environments that are always initialized.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;High I/O Concurrency&lt;/strong&gt;: Your application performs many I/O bound tasks (like calling external APIs) and can efficiently process multiple requests on a single vCPU without blocking.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Container Service
&lt;/h2&gt;

&lt;p&gt;Amazon ECS is a highly scalable container orchestration service that automates the deployment and management of containers across AWS infrastructure.&lt;br&gt;&lt;br&gt;
Below is a comparison between Amazon ECS (self-managed EC2) and the Amazon ECS Managed Instances:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/FunSEB9nU-xhb7aXqiZaZSLovBhDvZ1G0hrv78d9IMI/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL2wwaGd3/dzZkeTBlM2p2MDhv/b3ptLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/FunSEB9nU-xhb7aXqiZaZSLovBhDvZ1G0hrv78d9IMI/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL2wwaGd3/dzZkeTBlM2p2MDhv/b3ptLnBuZw" alt=" " width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use Which Alternative
&lt;/h3&gt;

&lt;p&gt;Use Amazon ECS (Self-Managed EC2) If:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You Need Custom AMIs&lt;/strong&gt;: Your compliance or legacy software requires a specific, hardened OS image or custom kernel modules.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You Require Host Access&lt;/strong&gt;: You need SSH access to the underlying node for deep debugging, forensic auditing, or installing host-level daemon agents that ECS doesn't support.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost is the Sole Priority&lt;/strong&gt;: You want to avoid the additional management fee and have a dedicated team that can manually optimize bin-packing and Spot instance usage for free.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legacy / Hybrid Constraints&lt;/strong&gt;: You are extending a specific on-premise network configuration or storage driver setup that requires manual OS configuration.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use Amazon ECS Managed Instances If:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You Need GPUs or High Memory&lt;/strong&gt;: You require specific hardware (like GPU instances for AI/ML) that AWS Fargate does not support, but you don't want to manage the OS.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You Want "Fargate-like" Operations with EC2 Pricing&lt;/strong&gt;: You want to offload patching and ASG management (like Fargate) but need to use Reserved Instances or Savings Plans to lower costs.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Compliance&lt;/strong&gt;: You need guaranteed, automated rotation of nodes for security patching (e.g., every 14 days) without building the automation pipelines yourself.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Steady-State Workloads&lt;/strong&gt;: Your traffic is predictable, making always-on EC2 instances more cost-effective than Fargate's per-second billing.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Kubernetes Service
&lt;/h2&gt;

&lt;p&gt;Amazon EKS is a fully managed service that simplifies running, scaling, and securing containerized applications by automating the management of the Kubernetes control plane on AWS.&lt;br&gt;&lt;br&gt;
Below is a comparison between Amazon EKS (self-managed nodes) and the Amazon EKS Managed Node Groups:  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/ihRwT0mIlH1GF4NujJUPzBenUhlYTowDQ2NYAOQSEAM/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzZjc2k1/cmNldTBtaXNpZnhh/M3J1LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/ihRwT0mIlH1GF4NujJUPzBenUhlYTowDQ2NYAOQSEAM/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzZjc2k1/cmNldTBtaXNpZnhh/M3J1LnBuZw" alt=" " width="800" height="244"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  When to Use Which Alternative
&lt;/h3&gt;

&lt;p&gt;Use Amazon EKS Managed Node Groups If:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Standard Kubernetes Workloads&lt;/strong&gt;: You are running standard applications and want to minimize the time spent on infrastructure maintenance.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simplified Scaling&lt;/strong&gt;: You want EKS to automatically handle the creation of Auto Scaling Groups that are natively aware of the cluster state.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Automated Security&lt;/strong&gt;: You want a streamlined way to apply security patches and OS updates to your cluster nodes without downtime.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operational Efficiency&lt;/strong&gt;: You have a small team and need to focus on application code rather than Kubernetes "plumbing."
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use Amazon EKS Self-Managed Nodes If:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom Operating Systems&lt;/strong&gt;: You must use a specific, hardened OS image (e.g., a highly customized Ubuntu or RHEL) that is not supported by Managed Node Groups.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex Bootstrap Scripts&lt;/strong&gt;: You need to run intricate "User Data" scripts during node startup that require fine-grained control over the initialization sequence.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unique Networking Requirements&lt;/strong&gt;: You are using specialized networking plugins or non-standard VPC configurations that require manual node configuration.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legacy Compliance&lt;/strong&gt;: You have strict regulatory requirements that mandate manual oversight and "manual sign-off" for every single OS-level change.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;In this blog post, I have reviewed several compute services (from FaaS, containers, and managed Kubernetes), each with its alternatives for either customer managing the compute nodes, or having AWS manage the compute nodes for the customers.&lt;br&gt;&lt;br&gt;
By leveraging AWS Lambda Managed Instances, Amazon ECS Managed Instances, and Amazon EKS Managed Node Groups, organizations can achieve high hardware performance without the burden of operational complexity. The primary advantage of this managed tier is the ability to decouple hardware selection from operating system maintenance. Developers can handpick specific EC2 families, such as GPU-optimized instances for AI or Graviton for cost efficiency, while AWS manages the heavy lifting of security patching and instance lifecycle updates.&lt;br&gt;&lt;br&gt;
Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.  &lt;/p&gt;

&lt;h3&gt;
  
  
  About the author
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Eyal Estrin&lt;/strong&gt; is a seasoned cloud and information security architect, &lt;a href="https://aws.amazon.com/developer/community/community-builders/" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, and author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;. With over 25 years of experience in the IT industry, he brings deep expertise to his work.&lt;br&gt;
Connect with Eyal on social media: &lt;a href="https://linktr.ee/eyalestrin" rel="noopener noreferrer"&gt;https://linktr.ee/eyalestrin&lt;/a&gt;.&lt;br&gt;
The opinions expressed here are his own and do not reflect those of his employer.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>serverless</category>
      <category>containers</category>
      <category>kubernetes</category>
    </item>
    <item>
      <title>When you have a hammer, everything looks like a nail</title>
      <dc:creator>Eyal Estrin</dc:creator>
      <pubDate>Tue, 06 Jan 2026 15:51:24 +0000</pubDate>
      <link>https://community.ops.io/eyalestrin/when-you-have-a-hammer-everything-looks-like-a-nail-1i35</link>
      <guid>https://community.ops.io/eyalestrin/when-you-have-a-hammer-everything-looks-like-a-nail-1i35</guid>
      <description>&lt;p&gt;In the over-evolving tech world, we often see organizations (from C-Level down to architects and engineers) rush to adopt the latest technology trends without conducting proper design or truly understanding the business requirements.&lt;br&gt;&lt;br&gt;
The result of failing to do a proper design is a waste of resources (from human time to compute), over-complicated architectures, or under-utilized resources.&lt;br&gt;&lt;br&gt;
In this blog post, I will dig into common architecture decisions and provide recommendations to avoid the pitfalls.&lt;br&gt;&lt;br&gt;
Let’s dig into some examples.  &lt;/p&gt;

&lt;h2&gt;
  
  
  Moving everything to the public cloud
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;p&gt;An enterprise mandates a full lift-and-shift of all workloads to a hyper-scaler to “become cloud-native,” including legacy ERP systems, mainframes, and latency-sensitive trading applications.  &lt;/p&gt;

&lt;h4&gt;
  
  
  What was misunderstood
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Some workloads had hard latency, data residency, or licensing constraints.
&lt;/li&gt;
&lt;li&gt;The applications were tightly coupled, stateful, and designed for vertical scaling.
&lt;/li&gt;
&lt;li&gt;Cost models were not analyzed beyond infrastructure savings.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Issues that emerged
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Higher total cost of ownership due to egress fees, oversized instances, and always-on resources.
&lt;/li&gt;
&lt;li&gt;Performance degradation for low-latency systems.
&lt;/li&gt;
&lt;li&gt;Operational complexity increased without gaining elasticity or resilience benefits.
&lt;/li&gt;
&lt;li&gt;Missed opportunity to modernize selectively (hybrid or refactor where justified).
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Kubernetes for every architecture
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;p&gt;A team deploys all applications - including small internal tools, batch jobs, and simple APIs - onto a shared Kubernetes platform.  &lt;/p&gt;

&lt;h4&gt;
  
  
  What was misunderstood
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes is an orchestration platform, not a free abstraction layer.
&lt;/li&gt;
&lt;li&gt;Many workloads did not need container orchestration, autoscaling, or self-healing.
&lt;/li&gt;
&lt;li&gt;The organization lacked operational maturity for cluster management and security.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Issues that emerged
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Increased cognitive load for developers (YAML, Helm, networking, ingress, RBAC).
&lt;/li&gt;
&lt;li&gt;The platform team became a bottleneck for simple changes.
&lt;/li&gt;
&lt;li&gt;Security misconfigurations (over-permissive service accounts, exposed services).
&lt;/li&gt;
&lt;li&gt;Slower delivery compared to simpler deployment models (VMs or managed PaaS).
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using Serverless for every solution
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;p&gt;An architect mandates that all new services must be implemented using Functions-as-a-Service.  &lt;/p&gt;

&lt;h4&gt;
  
  
  What was misunderstood
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Serverless excels at event-driven, stateless, bursty workloads - not long-running or chatty processes.
&lt;/li&gt;
&lt;li&gt;Cold starts, execution limits, and state management trade-offs were ignored.
&lt;/li&gt;
&lt;li&gt;Observability and debugging differ significantly from traditional services.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Issues that emerged
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Latency spikes impacting user-facing APIs.
&lt;/li&gt;
&lt;li&gt;Complex orchestration logic is spread across functions, reducing maintainability.
&lt;/li&gt;
&lt;li&gt;Higher costs for sustained workloads compared to containers or VMs.
&lt;/li&gt;
&lt;li&gt;Difficult troubleshooting due to fragmented logs and distributed execution paths.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Using GenAI to solve every problem
&lt;/h2&gt;

&lt;h4&gt;
  
  
  Example
&lt;/h4&gt;

&lt;p&gt;A company integrates GenAI into customer support, code reviews, security analysis, and decision-making workflows without clearly defined use cases.  &lt;/p&gt;

&lt;h4&gt;
  
  
  What was misunderstood
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;GenAI produces probabilistic outputs, not deterministic answers.
&lt;/li&gt;
&lt;li&gt;Data quality, context boundaries, and hallucination risks were underestimated.
&lt;/li&gt;
&lt;li&gt;Regulatory, privacy, and intellectual property implications were not assessed.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Issues that emerged
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Incorrect or misleading responses are presented as authoritative.
&lt;/li&gt;
&lt;li&gt;Leakage of sensitive data through prompts or training feedback loops.
&lt;/li&gt;
&lt;li&gt;Increased operational risk when AI outputs were trusted without validation.
&lt;/li&gt;
&lt;li&gt;High costs with unclear ROI due to overuse in low-value scenarios.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical recommendations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Start with business drivers, not technology&lt;/strong&gt; - Define success metrics first: cost model, performance requirements, regulatory constraints, delivery speed, and operational ownership. Technology should follow these inputs - not precede them.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicitly document constraints and non-goals&lt;/strong&gt; - Latency, data residency, licensing, team skills, and operational maturity must be captured early. Many architectural failures stem from ignored or implicit constraints.
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Apply technologies where their strengths are essential&lt;/strong&gt;:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Public cloud&lt;/strong&gt;: prioritize elasticity, managed services, and global reach - not lift-and-shift.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes&lt;/strong&gt;: use it where orchestration, portability, and scale justify its complexity.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serverless&lt;/strong&gt;: limit the use of Serverless to event-driven and bursty workloads.
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GenAI&lt;/strong&gt;: apply where probabilistic output is acceptable and verifiable.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Favor simplicity as a default&lt;/strong&gt; - If a simpler architecture meets requirements, it is usually the correct choice. Complexity should be earned, not assumed.  &lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Continuously validate assumptions&lt;/strong&gt; - Revisit architectural decisions as workloads evolve. What was once justified can become technical debt when context changes.  &lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reward outcome-driven architecture&lt;/strong&gt; - Measure architects and teams on business impact, reliability, and cost efficiency - not on adoption of trendy platforms.  &lt;/p&gt;&lt;/li&gt;

&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The recurring failure pattern in modern architectures is not poor technology choice, but &lt;strong&gt;premature commitment to a tool before understanding the problem&lt;/strong&gt;. Cloud platforms, Kubernetes, Serverless, and GenAI are powerful when applied deliberately - and damaging when treated as universal defaults. When architects start with the solution, they optimize for platform elegance instead of business outcomes.  &lt;/p&gt;

&lt;h3&gt;
  
  
  About the author
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Eyal Estrin&lt;/strong&gt; is a seasoned cloud and information security architect, &lt;a href="https://aws.amazon.com/developer/community/community-builders/" rel="noopener noreferrer"&gt;AWS Community Builder&lt;/a&gt;, and author of &lt;a href="https://amzn.to/42Xai9A" rel="noopener noreferrer"&gt;Cloud Security Handbook&lt;/a&gt; and &lt;a href="https://amzn.to/3Sggbtv" rel="noopener noreferrer"&gt;Security for Cloud Native Applications&lt;/a&gt;. With over 25 years of experience in the IT industry, he brings deep expertise to his work.&lt;br&gt;
Connect with Eyal on social media: &lt;a href="https://linktr.ee/eyalestrin" rel="noopener noreferrer"&gt;https://linktr.ee/eyalestrin&lt;/a&gt;.&lt;br&gt;
The opinions expressed here are his own and do not reflect those of his employer.&lt;/p&gt;

</description>
      <category>cloudops</category>
      <category>kubernetes</category>
      <category>serverless</category>
      <category>ai</category>
    </item>
    <item>
      <title>MSP Cybersecurity: Addressing the Top Threats to Client Trust and Operations</title>
      <dc:creator>Olivia</dc:creator>
      <pubDate>Wed, 31 Dec 2025 13:02:47 +0000</pubDate>
      <link>https://community.ops.io/oliviacx/msp-cybersecurity-addressing-the-top-threats-to-client-trust-and-operations-2idc</link>
      <guid>https://community.ops.io/oliviacx/msp-cybersecurity-addressing-the-top-threats-to-client-trust-and-operations-2idc</guid>
      <description>&lt;p&gt;Managed service providers (MSPs) face an increasingly complex cybersecurity landscape, where even minor gaps can have major consequences for both their own operations and the clients they serve. Understanding and addressing &lt;a href="https://www.nakivo.com/blog/cybersecurity-challenges-for-msps/" rel="noopener noreferrer"&gt;MSP cybersecurity challenges&lt;/a&gt; is critical for maintaining business continuity, client trust, and regulatory compliance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Credential Compromise
&lt;/h2&gt;

&lt;p&gt;One of the most common and dangerous threats MSPs face is credential compromise. Attackers who gain access to valid credentials can bypass many security controls, potentially affecting multiple client environments at once. Common causes include stolen or weak passwords, credential reuse across systems, and the lack of multi-factor authentication (MFA) for critical accounts. Securing privileged accounts with unique credentials and MFA is a foundational step in mitigating this risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  Insider Threats
&lt;/h2&gt;

&lt;p&gt;Insider threats, whether intentional or accidental, pose a significant risk to MSP operations. Disgruntled employees or negligent insiders with access to sensitive systems can compromise client data, disrupt services, or damage the MSP’s reputation. Proactive measures, such as strict access controls, activity monitoring, and clear internal policies, are essential to reduce the likelihood of insider-related incidents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Inadequate Monitoring and Logging
&lt;/h2&gt;

&lt;p&gt;Without centralized logging and real-time monitoring, security incidents can go undetected for extended periods. Delayed detection allows attackers to move laterally across systems, increasing the potential impact of breaches. Implementing robust Security Information and Event Management (SIEM) solutions and automated alerting can significantly improve incident visibility and response times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Poor Incident Response Readiness
&lt;/h2&gt;

&lt;p&gt;Many MSPs lack formal incident response plans or do not conduct regular drills. In the absence of structured procedures, MSPs may struggle to quickly isolate affected systems, communicate with clients, and contain breaches. A tested incident response framework ensures faster recovery, minimizes client disruption, and reduces legal and reputational risks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Data Exfiltration and Leakage
&lt;/h2&gt;

&lt;p&gt;Exposing client data through misconfigured cloud storage, unencrypted backups, or insufficient data loss prevention (DLP) measures can result in severe regulatory penalties and loss of client trust. MSPs must implement strong data protection policies and regularly audit client environments to prevent accidental or malicious data exposure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phishing and Social Engineering
&lt;/h2&gt;

&lt;p&gt;Phishing and social engineering remain some of the most effective attack vectors against MSPs. Cybercriminals often use emails, phone calls, or messaging platforms to steal credentials or deploy malware. A single successful phishing attempt can compromise entire client environments, making it one of the &lt;a href="https://hackmd.io/@alextray812/Top-MSP-Cybersecurity-Challenges" rel="noopener noreferrer"&gt;top MSP cyber security challenges&lt;/a&gt;&lt;br&gt;
 that providers must continuously address.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;MSPs operate in a high-stakes environment where threats can emerge from multiple directions. Successfully defending against these risks requires a proactive approach, including continuous monitoring, robust access controls, strong incident response planning, and comprehensive data protection strategies. By prioritizing operational maturity and addressing these key security risks, MSPs can safeguard their clients, maintain compliance, and strengthen long-term trust.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Terraform: creating an AWS OpenSearch Service cluster and users</title>
      <dc:creator>Arseny Zinchenko</dc:creator>
      <pubDate>Tue, 30 Dec 2025 10:00:00 +0000</pubDate>
      <link>https://community.ops.io/setevoy/terraform-creating-an-aws-opensearch-service-cluster-and-users-38hb</link>
      <guid>https://community.ops.io/setevoy/terraform-creating-an-aws-opensearch-service-cluster-and-users-38hb</guid>
      <description>&lt;p&gt;&lt;a href="https://community.ops.io/images/oGpVQKPMbLuswDYdnE4d9_DXS1in1YHpIgbRXNvYTDU/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzIyNnRo/OXQ1djRpb2Nsajlz/aTF3LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/oGpVQKPMbLuswDYdnE4d9_DXS1in1YHpIgbRXNvYTDU/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzIyNnRo/OXQ1djRpb2Nsajlz/aTF3LnBuZw" width="480" height="240"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the first part, we covered the basics of AWS OpenSearch Service in general and the types of instances for Data Nodes — &lt;a href="https://rtfm.co.ua/en/aws-introduction-to-the-opensearch-service-as-a-vector-store/" rel="noopener noreferrer"&gt;AWS: Getting Started with OpenSearch Service as a Vector Store&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the second part, we covered access, &lt;a href="https://rtfm.co.ua/en/aws-creating-an-opensearch-service-cluster-and-configuring-authentication-and-authorization/" rel="noopener noreferrer"&gt;AWS: Creating an OpenSearch Service Cluster and Configuring Authentication and Authorization&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Now let’s write Terraform code to create a cluster, users, and indexes.&lt;/p&gt;

&lt;p&gt;We will create the cluster in VPC and use the internal user database for authentication.&lt;/p&gt;

&lt;p&gt;But in VPC, you can’t… Because — surprise! — AWS Bedrock requires OpenSearch Managed Cluster to be public, not in VPC.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The OpenSearch Managed Cluster you provided is not supported because it is VPC protected. Your cluster must be behind a public network.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I wrote to the AWS tech. support, and they said:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;However, there is an ongoing product feature request (PFR) to have Bedrock KnowledgeBases support provisioned Open Search clusters in VPC.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And they suggest using Amazon OpenSearch Serverless, which we are actually running away from because the prices are ridiculous.&lt;/p&gt;

&lt;p&gt;The second problem that arose when I started writing resources &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/bedrockagent_knowledge_base" rel="noopener noreferrer"&gt;&lt;code&gt;bedrockagent_knowledge_base&lt;/code&gt;&lt;/a&gt; is that it does not support &lt;code&gt;storage_configuration with type&lt;/code&gt;OPENSEARCH_MANAGED`, only Serverless.&lt;/p&gt;

&lt;p&gt;But &lt;a href="https://github.com/hashicorp/terraform-provider-aws/pull/44060" rel="noopener noreferrer"&gt;Pull Request for this already exists&lt;/a&gt;, maybe someday they will approve it.&lt;br&gt;
&lt;em&gt;(&lt;em&gt;UPD&lt;/em&gt;: this was already merged)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;So, we will create an OpenSearch Managed Service cluster with three indexes  —  Dev/Staging/Prod.&lt;/p&gt;

&lt;p&gt;The cluster will have three small data nodes, and each index will have 1 primary shard and 1 replica, because the project is small, and the data in our Production index on AWS OpenSearch Serverless, from which we want to migrate to AWS OpenSearch Service, is currently only 2 GiB, and is unlikely to grow significantly in the future.&lt;/p&gt;

&lt;p&gt;It would be good to create the cluster in our own Terraform module to make it easier to create some test environments, as I did for AWS EKS, but there isn’t much time for that right now, so we’ll just use tf files with a separate &lt;code&gt;prod.tfvars&lt;/code&gt; for variables.&lt;/p&gt;

&lt;p&gt;Maybe later I’ll write separately about transferring it to our own module, because it’s really convenient.&lt;/p&gt;

&lt;p&gt;In the next part, we’ll talk about monitoring, because our Production has already crashed once :-)&lt;/p&gt;

&lt;h3&gt;
  
  
  Contents
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Terraform files structure&lt;/li&gt;
&lt;li&gt;Project planning&lt;/li&gt;
&lt;li&gt;Creating a cluster&lt;/li&gt;
&lt;li&gt;Custom endpoint configuration&lt;/li&gt;
&lt;li&gt;Terraform Outputs&lt;/li&gt;
&lt;li&gt;Creating OpenSearch Users&lt;/li&gt;
&lt;li&gt;Error: elastic: Error 403 (Forbidden)&lt;/li&gt;
&lt;li&gt;Creating Internal Users&lt;/li&gt;
&lt;li&gt;Internal database users&lt;/li&gt;
&lt;li&gt;Adding IAM Users&lt;/li&gt;
&lt;li&gt;Creating AWS Bedrock IAM Roles and OpenSearch Role mappings&lt;/li&gt;
&lt;li&gt;Creating OpenSearch indexes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Terraform files structure
&lt;/h3&gt;

&lt;p&gt;The initial file and directory structure of the project is as follows:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
$ tree .&lt;br&gt;
.&lt;br&gt;
├── README.md&lt;br&gt;
└── terraform&lt;br&gt;
    ├── Makefile&lt;br&gt;
    ├── backend.tf&lt;br&gt;
    ├── data.tf&lt;br&gt;
    ├── envs&lt;br&gt;
    │ └── prod&lt;br&gt;
    │ └── prod.tfvars&lt;br&gt;
    ├── locals.tf&lt;br&gt;
    ├── outputs.tf&lt;br&gt;
    ├── providers.tf&lt;br&gt;
    ├── variables.tf&lt;br&gt;
    └── versions.tf&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;providers.tf&lt;/code&gt; - provider settings, currently only AWS, and through it we set the default tags:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
provider "aws" {&lt;br&gt;
  region = var.aws_region&lt;br&gt;
  default_tags {&lt;br&gt;
    tags = {&lt;br&gt;
      component = var.component&lt;br&gt;
      created-by = "terraform"&lt;br&gt;
      environment = var.environment&lt;br&gt;
    }&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;data.tf&lt;/code&gt;, we collect AWS Account ID, Availability Zones, VPC, and private subnets in which we will create a cluster in which we will eventually create a cluster:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
data "aws_caller_identity" "current" {}&lt;/p&gt;

&lt;p&gt;data "aws_availability_zones" "available" {&lt;br&gt;
  state = "available"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;data "aws_vpc" "eks_vpc" {&lt;br&gt;
  id = var.vpc_id&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;data "aws_subnets" "private" {&lt;br&gt;
  filter {&lt;br&gt;
    name = "vpc-id"&lt;br&gt;
    values = [var.vpc_id]&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;tags = {&lt;br&gt;
    subnet-type = "private"&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;File &lt;code&gt;variables.tf&lt;/code&gt; with our default variables, then we will add new ones:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
variable "aws_region" {&lt;br&gt;
  type = string&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;variable "project_name" {&lt;br&gt;
  description = "A project name to be used in resources"&lt;br&gt;
  type = string&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;variable "component" {&lt;br&gt;
  description = "A team using this project (backend, web, ios, data, devops)"&lt;br&gt;
  type = string&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;variable "environment" {&lt;br&gt;
  description = "Dev/Prod, will be used in AWS resources Name tag, and resources names"&lt;br&gt;
  type = string&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;variable "vpc_id" {&lt;br&gt;
  type = string&lt;br&gt;
  description = "A VPC ID to be used to create OpenSearch cluster and its Nodes"&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Pass the values of variables through a separate &lt;code&gt;prod.tfvars&lt;/code&gt; file, then, if necessary, we can create a new environment through a file of the type &lt;code&gt;envs/test/test.tfvars&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
aws_region = "us-east-1"&lt;br&gt;
project_name = "atlas-kb"&lt;br&gt;
component = "backend"&lt;br&gt;
environment = "prod"&lt;br&gt;
vpc_id = "vpc-0fbaffe234c0d81ea"&lt;br&gt;
dns_zone = "prod.example.co"&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;Makefile&lt;/code&gt;, we simplify our local life:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;/p&gt;

&lt;h3&gt;
  
  
  PROD
&lt;/h3&gt;

&lt;p&gt;init-prod:&lt;br&gt;
  terraform init -reconfigure -backend-config="key=prod/atlas-knowledge-base-prod.tfstate"&lt;/p&gt;

&lt;p&gt;plan-prod:&lt;br&gt;
  terraform plan -var-file=envs/prod/prod.tfvars&lt;/p&gt;

&lt;p&gt;apply-prod:&lt;br&gt;
  terraform apply -var-file=envs/prod/prod.tfvars&lt;/p&gt;

&lt;h1&gt;
  
  
  destroy-prod:
&lt;/h1&gt;

&lt;h1&gt;
  
  
  terraform destroy -var-file=envs/prod/prod.tfvars
&lt;/h1&gt;

&lt;p&gt;`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;What files will be next?&lt;/p&gt;

&lt;p&gt;We will also have AWS Bedrock, which will need to be configured for access — we will do this through its IAM Role, and I will not write about Bedrock here — because it is a separate topic, and Terraform does not yet support &lt;code&gt;OPENSEARCH_MANAGED&lt;/code&gt;, so we did it manually, and then we will execute &lt;a href="https://rtfm.co.ua/en/terraform-using-import-and-some-hiden-pitfalls/" rel="noopener noreferrer"&gt;terraform import&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We will create indexes, users for our Backend API, and Bedrock IAM Role mappings in OpenSearch’s internal database through Terraform OpenSearch Provider to simplify OpenSearch Dashboards access.&lt;/p&gt;

&lt;h3&gt;
  
  
  Project planning
&lt;/h3&gt;

&lt;p&gt;We can create a cluster from the Terraform resource &lt;a href="https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/opensearch_domain" rel="noopener noreferrer"&gt;&lt;code&gt;aws_opensearch_domain&lt;/code&gt;&lt;/a&gt;, or we can use ready-made modules, such as the &lt;a href="https://registry.terraform.io/modules/terraform-aws-modules/opensearch/aws/latest" rel="noopener noreferrer"&gt;opensearch&lt;/a&gt; from &lt;a href="https://www.linkedin.com/in/antonbabenko/" rel="noopener noreferrer"&gt;@Anton Babenko&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Let’s take Anton’s module, because I use his modules a lot, and everything works great.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating a cluster
&lt;/h3&gt;

&lt;p&gt;Examples — &lt;a href="https://github.com/terraform-aws-modules/terraform-aws-opensearch/tree/master/examples" rel="noopener noreferrer"&gt;terraform-aws-opensearch/tree/master/examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Add a variable with cluster parameters to the &lt;code&gt;variables.tf&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;variable "cluser_options" {&lt;br&gt;
  description = "A map of options to configure the OpenSearch cluster"&lt;br&gt;
  type = object({&lt;br&gt;
    instance_type = string&lt;br&gt;
    instance_count = number&lt;br&gt;
    volume_size = number&lt;br&gt;
    volume_type = string&lt;br&gt;
    engine_version = string&lt;br&gt;
    auto_software_update_enabled = bool&lt;br&gt;
  })&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And a value in &lt;code&gt;prod.tfvars&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;cluser_options = {&lt;br&gt;
  instance_type = "t3.small.search"&lt;br&gt;
  instance_count = 3&lt;br&gt;
  volume_size = 50&lt;br&gt;
  volume_type = "gp3"&lt;br&gt;
  engine_version = "OpenSearch_2.19"&lt;br&gt;
  auto_software_update_enabled = true&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;t3.small.search&lt;/code&gt; instances are the most minimal and sufficient for us at this time, although there are limitations for &lt;code&gt;t3&lt;/code&gt;, such as the &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/auto-tune.html" rel="noopener noreferrer"&gt;AWS OpenSearch Auto-tune&lt;/a&gt; feature not being supported.&lt;/p&gt;

&lt;p&gt;In general, &lt;code&gt;t3&lt;/code&gt; is not intended for production use cases. See also &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/bp.html" rel="noopener noreferrer"&gt;Operational best practices for Amazon OpenSearch Service&lt;/a&gt;, &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/supported-instance-types.html#latest-gen" rel="noopener noreferrer"&gt;Current generation instance types&lt;/a&gt;, and &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/limits.html" rel="noopener noreferrer"&gt;Amazon OpenSearch Service quotas&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I set the version here to 2.9, but 3.1 was added just a few days ago — see &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html#choosing-version" rel="noopener noreferrer"&gt;Supported versions of Elasticsearch and OpenSearch&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We take three nodes so that the cluster can select a cluster manager node if one node fails, see &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/managedomains-multiaz.html" rel="noopener noreferrer"&gt;Dedicated master node distribution&lt;/a&gt;, &lt;a href="https://www.instaclustr.com/blog/learning-opensearch-from-scratch-part-2-digging-deeper/" rel="noopener noreferrer"&gt;Learning OpenSearch from scratch, part 2: Digging deeper&lt;/a&gt;, and &lt;a href="https://aws.amazon.com/blogs/big-data/enhance-stability-with-dedicated-cluster-manager-nodes-using-amazon-opensearch-service/" rel="noopener noreferrer"&gt;Enhance stability with dedicated cluster manager nodes using Amazon OpenSearch Service&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Contents of the &lt;code&gt;locals.tf&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
locals {&lt;br&gt;
  # 'atlas-kb-prod'&lt;br&gt;
  env_name = "${var.project_name}-${var.environment}"&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Most of the &lt;code&gt;locals&lt;/code&gt; will be right here, but some that are very "local" to a particular code will be in the resource code files.&lt;/p&gt;

&lt;p&gt;Add the file &lt;code&gt;opensearcth_users.tf&lt;/code&gt; - for now, there is only a root user here, and the password is stored in AWS Parameter Store (instead of AWS Secrets Manager - "that's just how it happened historically“):&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;/p&gt;

&lt;h3&gt;
  
  
  ROOT
&lt;/h3&gt;

&lt;h1&gt;
  
  
  generate root password
&lt;/h1&gt;

&lt;h1&gt;
  
  
  waiting for write-only: &lt;a href="https://github.com/hashicorp/terraform-provider-aws/pull/43621" rel="noopener noreferrer"&gt;https://github.com/hashicorp/terraform-provider-aws/pull/43621&lt;/a&gt;
&lt;/h1&gt;

&lt;h1&gt;
  
  
  then will update it with the ephemeral type
&lt;/h1&gt;

&lt;p&gt;resource "random_password" "os_master_password" {&lt;br&gt;
  length = 16&lt;br&gt;
  special = true&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  store the root password in AWS Parameter Store
&lt;/h1&gt;

&lt;p&gt;resource "aws_ssm_parameter" "os_master_password" {&lt;br&gt;
  name = "/${var.environment}/${local.env_name}-root-password"&lt;br&gt;
  description = "OpenSearch cluster master password"&lt;br&gt;
  type = "SecureString"&lt;br&gt;
  value = random_password.os_master_password.result&lt;br&gt;
  overwrite = true&lt;br&gt;
  tier = "Standard"&lt;/p&gt;

&lt;p&gt;lifecycle {&lt;br&gt;
    ignore_changes = [value] # to prevent diff every time password is regenerated&lt;br&gt;
  }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;data "aws_ssm_parameter" "os_master_password" {&lt;br&gt;
  name = "/${var.environment}/${local.env_name}-root-password"&lt;br&gt;
  with_decryption = true&lt;/p&gt;

&lt;p&gt;depends_on = [aws_ssm_parameter.os_master_password]&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Let’s write the &lt;code&gt;opensearch_cluster.tf&lt;/code&gt; file.&lt;/p&gt;

&lt;p&gt;I left the config for VPC here for future reference and just as an example, although it will not be possible to transfer an already created cluster to VPC — you will have to create a new one, see &lt;strong&gt;Limitations&lt;/strong&gt; in the documentation &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/vpc.html#vpc-limitations" rel="noopener noreferrer"&gt;Launching your Amazon OpenSearch Service domains within a VPC&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
module "opensearch" {&lt;br&gt;
  source = "terraform-aws-modules/opensearch/aws"&lt;br&gt;
  version = "~&amp;gt; 2.0.0"  &lt;/p&gt;

&lt;p&gt;# enable Fine-grained access control&lt;br&gt;
  # by using the internal user database, we'll simply access to the Dashboards&lt;br&gt;
  # for backend API Kubernetes Pods, will use Kubernetes Secrets with username:password from AWS Parameter Store&lt;br&gt;
  advanced_security_options = {&lt;br&gt;
    enabled = true&lt;br&gt;
    anonymous_auth_enabled = false&lt;br&gt;
    internal_user_database_enabled = true&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;master_user_options = {
  master_user_name = "os_root"
  master_user_password = data.aws_ssm_parameter.os_master_password.value
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;# can't be used with t3 instances&lt;br&gt;
  auto_tune_options = {&lt;br&gt;
    desired_state = "DISABLED"&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;# have three data nodes - t3.small.search nodes in two AZs&lt;br&gt;
  # will use 3 indexes - dev/stage/prod with 1 shard and 1 replica each&lt;br&gt;
  cluster_config = {&lt;br&gt;
    instance_count = var.cluser_options.instance_count&lt;br&gt;
    dedicated_master_enabled = false&lt;br&gt;
    instance_type = var.cluser_options.instance_type&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# put both data-nodes in different AZs
zone_awareness_config = {
  availability_zone_count = 2
}

zone_awareness_enabled = true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;# the cluster's name&lt;br&gt;
  # 'atlas-kb-prod'&lt;br&gt;
  domain_name = "${local.env_name}-cluster"&lt;/p&gt;

&lt;p&gt;# 50 GiB for each Data Node&lt;br&gt;
  ebs_options = {&lt;br&gt;
    ebs_enabled = true&lt;br&gt;
    volume_type = var.cluser_options.volume_type&lt;br&gt;
    volume_size = var.cluser_options.volume_size&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;encrypt_at_rest = {&lt;br&gt;
    enabled = true&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;# latest for today:&lt;br&gt;
  # &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html#choosing-version" rel="noopener noreferrer"&gt;https://docs.aws.amazon.com/opensearch-service/latest/developerguide/what-is.html#choosing-version&lt;/a&gt;&lt;br&gt;
  engine_version = var.cluser_options.engine_version&lt;/p&gt;

&lt;p&gt;# enable CloudWatch logs for Index and Search slow logs&lt;br&gt;
  # TODO: collect to VictoriaLogs or Loki, and create metrics and alerts&lt;br&gt;
  log_publishing_options = [&lt;br&gt;
    { log_type = "INDEX_SLOW_LOGS" },&lt;br&gt;
    { log_type = "SEARCH_SLOW_LOGS" },&lt;br&gt;
  ]&lt;/p&gt;

&lt;p&gt;ip_address_type = "ipv4"&lt;/p&gt;

&lt;p&gt;node_to_node_encryption = {&lt;br&gt;
    enabled = true&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;# allow minor version updates automatically&lt;br&gt;
  # will be performed during off-peak windows&lt;br&gt;
  software_update_options = {&lt;br&gt;
    auto_software_update_enabled = var.cluser_options.auto_software_update_enabled&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;# DO NOT use 'atlas-vpc-ops' VPC and its private subnets&lt;br&gt;
  # &amp;gt; "The OpenSearch Managed Cluster you provided is not supported because it is VPC protected. Your cluster must be behind a public network."&lt;br&gt;
  # vpc_options = {&lt;br&gt;
  # subnet_ids = data.aws_subnets.private.ids&lt;br&gt;
  # }&lt;/p&gt;

&lt;p&gt;# # VPC endpoint to access from Kubernetes Pods&lt;br&gt;
  # vpc_endpoints = {&lt;br&gt;
  # one = {&lt;br&gt;
  # subnet_ids = data.aws_subnets.private.ids&lt;br&gt;
  # }&lt;br&gt;
  # }&lt;/p&gt;

&lt;p&gt;# Security Group rules to allow access from the VPC only&lt;br&gt;
  # security_group_rules = {&lt;br&gt;
  # ingress_443 = {&lt;br&gt;
  # type = "ingress"&lt;br&gt;
  # description = "HTTPS access from VPC"&lt;br&gt;
  # from_port = 443&lt;br&gt;
  # to_port = 443&lt;br&gt;
  # ip_protocol = "tcp"&lt;br&gt;
  # cidr_ipv4 = data.aws_vpc.ops_vpc.cidr_block&lt;br&gt;
  # }&lt;br&gt;
  # }&lt;/p&gt;

&lt;p&gt;# Access policy&lt;br&gt;
  # necessary to allow access for AWS user to the Dashboards&lt;br&gt;
  access_policy_statements = [&lt;br&gt;
    {&lt;br&gt;
      effect = "Allow"&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  principals = [{
    type = "*"
    identifiers = ["*"]
  }]

  actions = ["es:*"]
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;]&lt;/p&gt;

&lt;p&gt;# 'atlas-kb-ops-os-cluster'&lt;br&gt;
  tags = {&lt;br&gt;
    Name = "${var.project_name}-${var.environment}-os-cluster"&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Basically, everything is described in the comments, but in short:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enable &lt;a href="https://rtfm.co.ua/en/aws-creating-an-opensearch-service-cluster-and-configuring-authentication-and-authorization/#Fine-grained_access_control" rel="noopener noreferrer"&gt;fine-grained access control&lt;/a&gt; and a local user database&lt;/li&gt;
&lt;li&gt;three data nodes, each with 50 gigabytes of disk space, in different Availability Zones&lt;/li&gt;
&lt;li&gt;enable logs in CloudWatch&lt;/li&gt;
&lt;li&gt;create a cluster in private subnets&lt;/li&gt;
&lt;li&gt;allow access for everyone in the Domain Access Policy&lt;/li&gt;
&lt;li&gt;well, that’s it for now… we can’t use Security Groups because we’re not in VPC, but how do we create an IP-based policy? We don’t know CIDR Bedrock&lt;/li&gt;
&lt;li&gt;or in the &lt;code&gt;principals.identifiers&lt;/code&gt; we could add a limit on our IAM Users + Bedrock AIM Role&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Run creating the cluster and go to have some tea, as this process will take around 20 minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Custom endpoint configuration
&lt;/h3&gt;

&lt;p&gt;After creating the cluster, check access to the Dashboards. If everything is OK, add a custom endpoint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;: Custom endpoints have their own quirks: in Terraform OpenSearch Provider, you need to use the custom endpoint URL, but in AWS Bedrock Knowledge Base, you need to use the default cluster URL.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To do this, we need to create a certificate in AWS Certificate Manager, and add a new record in Route53.&lt;/p&gt;

&lt;p&gt;I expected a possible chicken-and-egg problem here, because Custom Endpoint settings depend on AWS ACM and a record in AWS Route53, and the record in AWS Route53 will depend on the cluster because it uses its endpoint.&lt;/p&gt;

&lt;p&gt;But no, if you create a new cluster with the settings described below, everything is created correctly: first, the certificate in AWS ACM, then the cluster with Custom Endpoint, then the record in Route53 with CNAME to the cluster default URL.&lt;/p&gt;

&lt;p&gt;Add a new &lt;code&gt;local&lt;/code&gt; - &lt;code&gt;os_custom_domain_name&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
locals {&lt;br&gt;
  # 'atlas-kb-prod'&lt;br&gt;
  env_name = "${var.project_name}-${var.environment}"&lt;br&gt;
  # 'opensearch.prod.example.co'&lt;br&gt;
  os_custom_domain_name = "opensearch.${var.dns_zone}"&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Add the Route53 zone data retrieval to the &lt;code&gt;data.tf&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;data "aws_route53_zone" "zone" {&lt;br&gt;
  name = var.dns_zone&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Add certificate creation and Route53 entry to the &lt;code&gt;opensearch_cluster.tf&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;/p&gt;

&lt;h1&gt;
  
  
  TLS for the Custom Domain
&lt;/h1&gt;

&lt;p&gt;module "prod_opensearch_acm" {&lt;br&gt;
  source = "terraform-aws-modules/acm/aws"&lt;br&gt;
  version = "~&amp;gt; 6.0"&lt;/p&gt;

&lt;p&gt;# 'opensearch.example.co'&lt;br&gt;
  domain_name = local.os_custom_domain_name&lt;br&gt;
  zone_id = data.aws_route53_zone.zone.zone_id&lt;/p&gt;

&lt;p&gt;validation_method = "DNS"&lt;br&gt;
  wait_for_validation = true&lt;/p&gt;

&lt;p&gt;tags = {&lt;br&gt;
    Name = local.os_custom_domain_name&lt;br&gt;
  }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "aws_route53_record" "opensearch_domain_endpoint" {&lt;br&gt;
  zone_id = data.aws_route53_zone.zone.zone_id&lt;br&gt;
  name = local.os_custom_domain_name&lt;br&gt;
  type = "CNAME"&lt;br&gt;
  ttl = 300&lt;br&gt;
  records = [module.opensearch.domain_endpoint]&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;...&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And in the &lt;code&gt;module "opensearch"&lt;/code&gt;, add the custom endpoint settings:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
...&lt;br&gt;
  domain_endpoint_options = {&lt;br&gt;
    custom_endpoint_certificate_arn = module.prod_opensearch_acm.acm_certificate_arn&lt;br&gt;
    custom_endpoint_enabled = true&lt;br&gt;
    custom_endpoint = local.os_custom_domain_name&lt;br&gt;
    tls_security_policy = "Policy-Min-TLS-1-2-2019-07"&lt;br&gt;
  }&lt;br&gt;
...&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Run &lt;code&gt;terraform init&lt;/code&gt; and &lt;code&gt;terraform apply&lt;/code&gt;, check the settings:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/TEGoeG1rhPJb_phM_qMK4pcl6px2Jua6sdQuLINCztA/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzVqdnd1/a2l6a3U1ZGU0dHll/ZGM4LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/TEGoeG1rhPJb_phM_qMK4pcl6px2Jua6sdQuLINCztA/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzVqdnd1/a2l6a3U1ZGU0dHll/ZGM4LnBuZw" width="538" height="337"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And check access to the Dashboards.&lt;/p&gt;

&lt;h3&gt;
  
  
  Terraform Outputs
&lt;/h3&gt;

&lt;p&gt;Let’s add some outputs.&lt;/p&gt;

&lt;p&gt;For now, just for ourselves, but later we may use them in imports from other projects, see &lt;a href="https://rtfm.co.ua/en/terraform-terraform_remote_state-getting-outputs-from-other-state-files/" rel="noopener noreferrer"&gt;Terraform: terraform_remote_state — getting outputs from other state files&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
output "vpc_id" {&lt;br&gt;
  value = var.vpc_id&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;output "cluster_arn" {&lt;br&gt;
  value = module.opensearch.domain_arn&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;output "opensearch_domain_endpoint_cluster" {&lt;br&gt;
  value = "https://${module.opensearch.domain_endpoint}"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;output "opensearch_domain_endpoint_custom" {&lt;br&gt;
  value = "https://${local.os_custom_domain_name}"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;output "opensearch_root_username" {&lt;br&gt;
  value = "os_root"&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;output "opensearch_root_user_password_secret_name" {&lt;br&gt;
  value = "/${var.environment}/${local.env_name}-root-password"&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating OpenSearch Users
&lt;/h3&gt;

&lt;p&gt;All that’s left now are users and indexes.&lt;/p&gt;

&lt;p&gt;We will have two types of users:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;regular users from the OpenSearch internal database — for our Backend API in Kubernetes (actually, we later switched to IAM Roles, which are mapped to the Backend via &lt;a href="https://rtfm.co.ua/aws-eks-pod-identities-zamina-irsa-sproshhuyemo-menedzhment-iam-dostupiv/" rel="noopener noreferrer"&gt;EKS Pod Identities&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;and users (IAM Role) for Bedrock — there will be three Knowledge Bases, each with its own IAM Role, for which we will need to add an OpenSearch Role and map it to IAM roles&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s start with regular users.&lt;/p&gt;

&lt;p&gt;Add a provider, in my case it is in the &lt;code&gt;versions.tf&lt;/code&gt; file:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
terraform {&lt;/p&gt;

&lt;p&gt;required_version = "~&amp;gt; 1.6"&lt;/p&gt;

&lt;p&gt;required_providers {&lt;br&gt;
    aws = {&lt;br&gt;
      source = "hashicorp/aws"&lt;br&gt;
      version = "~&amp;gt; 6.0"&lt;br&gt;
    }&lt;br&gt;
    opensearch = {&lt;br&gt;
      source = "opensearch-project/opensearch"&lt;br&gt;
      version = "~&amp;gt; 2.3"&lt;br&gt;
    }&lt;br&gt;
  }&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;providers.tf&lt;/code&gt; file, describe access to the cluster:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;provider "opensearch" {&lt;br&gt;
  url = "https://${local.os_custom_domain_name}"&lt;br&gt;
  username = "os_root"&lt;br&gt;
  password = data.aws_ssm_parameter.os_master_password.value&lt;br&gt;
  healthcheck = false&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Error: elastic: Error 403 (Forbidden)
&lt;/h3&gt;

&lt;p&gt;Here is an important point about the &lt;code&gt;url&lt;/code&gt; value in the provider configuration. I wrote about it above, and now I will show you how it looks.&lt;/p&gt;

&lt;p&gt;First, in the &lt;code&gt;provider.url&lt;/code&gt;, I set it as &lt;code&gt;outputs&lt;/code&gt; of the module, i.e. &lt;code&gt;module.opensearch.domain_endpoint&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Because of this, I got a 403 error when I tried to create users:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;&lt;br&gt;
...&lt;br&gt;
opensearch_user.os_kraken_dev_user: Creating...&lt;br&gt;
opensearch_role.os_kraken_dev_role: Creating...&lt;br&gt;
╷&lt;br&gt;
│ Error: elastic: Error 403 (Forbidden)&lt;br&gt;
│ &lt;br&gt;
│ with opensearch_user.os_kraken_dev_user,&lt;br&gt;
│ on opensearch_users.tf line 23, in resource "opensearch_user" "os_kraken_dev_user":&lt;br&gt;
│ 23: resource "opensearch_user" "os_kraken_dev_user" {&lt;br&gt;
│ &lt;br&gt;
╵&lt;br&gt;
╷&lt;br&gt;
│ Error: elastic: Error 403 (Forbidden)&lt;br&gt;
│ &lt;br&gt;
│ with opensearch_role.os_kraken_dev_role,&lt;br&gt;
│ on opensearch_users.tf line 30, in resource "opensearch_role" "os_kraken_dev_role":&lt;br&gt;
│ 30: resource "opensearch_role" "os_kraken_dev_role" {&lt;br&gt;
&lt;/code&gt;&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Thus, set the URL in the form of FQDN, which we did for Custom Endpoint, something like &lt;code&gt;"url = https://opensearch.exmaple.com"&lt;/code&gt; - and everything works well.&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating Internal Users
&lt;/h3&gt;

&lt;p&gt;Now for the users themselves.&lt;/p&gt;

&lt;p&gt;There will be three of them — &lt;em&gt;dev&lt;/em&gt;, &lt;em&gt;staging&lt;/em&gt;, &lt;em&gt;prod&lt;/em&gt;, each with access to the corresponding index.&lt;/p&gt;

&lt;p&gt;Here we will use &lt;a href="https://registry.terraform.io/providers/opensearch-project/opensearch/latest/docs/resources/user" rel="noopener noreferrer"&gt;&lt;code&gt;opensearch_user&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If the cluster is created in VPC, a VPN connection is required so that the provider can connect to the cluster.&lt;/p&gt;

&lt;p&gt;Add &lt;a href="https://rtfm.co.ua/en/terraform-introduction-to-data-types-primitives-and-complex/#list" rel="noopener noreferrer"&gt;list()&lt;/a&gt; to the &lt;code&gt;variables.tf&lt;/code&gt; with a list of environments:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;variable "app_environments" {&lt;br&gt;
  type = list(string)&lt;br&gt;
  description = "The Application's environments, to be used to created Dev/Staging/Prod DynamoDB tables, etc"&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;And the value in &lt;code&gt;prod.tfvars&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;app_environments = [&lt;br&gt;
  "dev",&lt;br&gt;
  "staging",&lt;br&gt;
  "prod"&lt;br&gt;
]&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Internal database users
&lt;/h3&gt;

&lt;p&gt;At first, I planned to just use local users, and wrote this option in this post — let it be. Next, I will show how we did it in the end — with IAM Users and IAM Roles.&lt;/p&gt;

&lt;p&gt;In the file &lt;code&gt;opensearch_users.tf&lt;/code&gt;, add three passwords, three users, and three roles to which we map users in loops - each role with access to its own index:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;h3&gt;
  
  
  KRAKEN
&lt;/h3&gt;

&lt;p&gt;resource "random_password" "os_kraken_password" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;br&gt;
  length = 16&lt;br&gt;
  special = true&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  store the root password in AWS Parameter Store
&lt;/h1&gt;

&lt;p&gt;resource "aws_ssm_parameter" "os_kraken_password" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;/p&gt;

&lt;p&gt;name = "/${var.environment}/${local.env_name}-kraken-${each.key}-password"&lt;br&gt;
  description = "OpenSearch cluster Backend Dev password"&lt;br&gt;
  type = "SecureString"&lt;br&gt;
  value = random_password.os_kraken_password[each.key].result&lt;br&gt;
  overwrite = true&lt;br&gt;
  tier = "Standard"&lt;/p&gt;

&lt;p&gt;lifecycle {&lt;br&gt;
    ignore_changes = [value] # to prevent diff every time password is regenerated&lt;br&gt;
  }&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  Create a user
&lt;/h1&gt;

&lt;p&gt;resource "opensearch_user" "os_kraken_user" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;/p&gt;

&lt;p&gt;username = "os_kraken_${each.key}"&lt;br&gt;
  password = random_password.os_kraken_password[each.key].result&lt;br&gt;
  description = "Backend EKS ${each.key} user"&lt;/p&gt;

&lt;p&gt;depends_on = [module.opensearch]&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  And a full user, role and role mapping example:
&lt;/h1&gt;

&lt;p&gt;resource "opensearch_role" "os_kraken_role" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;/p&gt;

&lt;p&gt;role_name = "os_kraken_${each.key}_role"&lt;br&gt;
  description = "Backend EKS ${each.key} role"&lt;/p&gt;

&lt;p&gt;cluster_permissions = [&lt;br&gt;
    "indices:data/read/msearch",&lt;br&gt;
    "indices:data/write/bulk*",&lt;br&gt;
   "indices:data/read/mget*"&lt;br&gt;
  ]&lt;br&gt;
  index_permissions {&lt;br&gt;
    index_patterns = ["kraken-kb-index-${each.key}"]&lt;br&gt;
    allowed_actions = ["*"]&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;depends_on = [module.opensearch]&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;cluster_permissions&lt;/code&gt;, we add permissions that are required for both the index level and the cluster level, because Bedrock did not work without them, see &lt;a href="https://docs.opensearch.org/latest/security/access-control/permissions/#cluster-wide-index-permissions" rel="noopener noreferrer"&gt;Cluster wide index permissions&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Deploy and check in Dashboards:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/SROMaZSRy9xDjY9yzsr9bMEdiRhhIxUmvZcvshiqe84/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3JzdTF6/emFybzdlMHhmeDBz/bXRpLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/SROMaZSRy9xDjY9yzsr9bMEdiRhhIxUmvZcvshiqe84/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3JzdTF6/emFybzdlMHhmeDBz/bXRpLnBuZw" width="525" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding IAM Users
&lt;/h3&gt;

&lt;p&gt;The idea here is the same, except that instead of regular users with a login:password for authentication, IAM and its Users &amp;amp;&amp;amp; Roles are used.&lt;/p&gt;

&lt;p&gt;More on the role for Bedrock later, but for now, let’s add user mapping.&lt;/p&gt;

&lt;p&gt;What we need to do is take a list of our Backend team users, give them an IAM Policy with access to OpenSearch, and then add mapping to a local role in the OpenSearch internal users database.&lt;/p&gt;

&lt;p&gt;For now, we can use the local role &lt;code&gt;all_access&lt;/code&gt;, although it would be better to write our own later. See &lt;a href="https://docs.opensearch.org/latest/security/access-control/users-roles/#predefined-roles" rel="noopener noreferrer"&gt;Predefined roles&lt;/a&gt; and &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html#fgac-master-user" rel="noopener noreferrer"&gt;About the master user&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Add a new variable to the &lt;code&gt;variables.tf&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;variable "backend_team_users_arns" {&lt;br&gt;
  type = list(string)&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Its value in the &lt;code&gt;prod.tfvars&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;p&gt;backend_team_users_arns = [&lt;br&gt;
  "arn:aws:iam::492*&lt;strong&gt;148:user/arseny",&lt;br&gt;
  "arn:aws:iam::492&lt;/strong&gt;&lt;em&gt;148:user/misha",&lt;br&gt;
  "arn:aws:iam::492&lt;/em&gt;&lt;strong&gt;148:user/oleksii",&lt;br&gt;
  "arn:aws:iam::492&lt;/strong&gt;*148:user/vladimir",&lt;br&gt;
  "os_root"&lt;br&gt;
]&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Here, we had to mess around with the user &lt;code&gt;os_root&lt;/code&gt;, because otherwise it would be removed from the role.&lt;/p&gt;

&lt;p&gt;So, it’s better to make normal roles — but for MVP, it’s okay.&lt;/p&gt;

&lt;p&gt;And we add the mapping of these IAM Users to the role &lt;code&gt;all_access&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;h3&gt;
  
  
  BACKEND TEAM
&lt;/h3&gt;

&lt;p&gt;resource "opensearch_roles_mapping" "all_access_mapping" {&lt;br&gt;
  role_name = "all_access"&lt;/p&gt;

&lt;p&gt;users = var.backend_team_users_arns&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Deploy, check the &lt;code&gt;all_access&lt;/code&gt; role:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/iJiPv3wdrgZajWT7OA_0ffoh4fn_iEhK-FdoIp_wfhk/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3Vmb3hm/ejhtY3V0MnE3a2Rx/ZXVxLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/iJiPv3wdrgZajWT7OA_0ffoh4fn_iEhK-FdoIp_wfhk/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3Vmb3hm/ejhtY3V0MnE3a2Rx/ZXVxLnBuZw" width="800" height="277"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Note&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;: ChatGPT stubbornly insisted on adding IAM Users to Backend Roles, but no, and this is clearly stated in the documentation — you need to add them to Users, see&lt;/em&gt; &lt;a href="https://docs.aws.amazon.com/opensearch-service/latest/developerguide/fgac.html#fgac-more-masters" rel="noopener noreferrer"&gt;&lt;em&gt;Additional master users&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;And for all the IAM Users we need to add an IAM policy with access.&lt;/p&gt;

&lt;p&gt;Again, for MVP, we can simply take the AWS managed policy &lt;a href="https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonOpenSearchServiceFullAccess.html" rel="noopener noreferrer"&gt;&lt;code&gt;AmazonOpenSearchServiceFullAccess&lt;/code&gt;&lt;/a&gt;, which is connected to the IAM Group:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/fTKM0nY4lGAa8faan491_wFG-4vNyCabQEFd7c9MMGw/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3Jtanhx/YjNiYjF6enVqemQy/MndhLnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/fTKM0nY4lGAa8faan491_wFG-4vNyCabQEFd7c9MMGw/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3Jtanhx/YjNiYjF6enVqemQy/MndhLnBuZw" width="800" height="744"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating AWS Bedrock IAM Roles and OpenSearch Role mappings
&lt;/h3&gt;

&lt;p&gt;We already have Bedrock, now just need to create new IAM Roles and map them to OpenSearch Roles.&lt;/p&gt;

&lt;p&gt;Add the &lt;code&gt;iam.tf&lt;/code&gt; file - describe the IAM Role and IAM Policy (Identity-based Policy for access to OpenSearch), also in a loop for each of the &lt;code&gt;var.app_environments&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;/p&gt;

&lt;h3&gt;
  
  
  MAIN ROLE FOR KNOWLEDGE BASE
&lt;/h3&gt;

&lt;h1&gt;
  
  
  grants permissions for AWS Bedrock to interact with other AWS services
&lt;/h1&gt;

&lt;p&gt;resource "aws_iam_role" "knowledge_base_role" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;br&gt;
  name = "${var.project_name}-role-${each.key}-managed"&lt;br&gt;
  assume_role_policy = jsonencode({&lt;br&gt;
    Version = "2012-10-17"&lt;br&gt;
    Statement = [&lt;br&gt;
      {&lt;br&gt;
        Action = "sts:AssumeRole"&lt;br&gt;
        Effect = "Allow"&lt;br&gt;
        Principal = {&lt;br&gt;
          Service = "bedrock.amazonaws.com"&lt;br&gt;
        }&lt;br&gt;
        Condition = {&lt;br&gt;
          StringEquals = {&lt;br&gt;
            "aws:SourceAccount" = data.aws_caller_identity.current.account_id&lt;br&gt;
          }&lt;br&gt;
          ArnLike = {&lt;br&gt;
            # restricts the role to be assumed only by Bedrock knowledge base in the specified region&lt;br&gt;
            "aws:SourceArn" = "arn:aws:bedrock:${var.aws_region}:${data.aws_caller_identity.current.account_id}:knowledge-base/*"&lt;br&gt;
          }&lt;br&gt;
        }&lt;br&gt;
      }&lt;br&gt;
    ]&lt;br&gt;
  })&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  IAM policy for Knowledge Base to access OpenSearch Managed
&lt;/h1&gt;

&lt;p&gt;resource "aws_iam_policy" "knowledge_base_opensearch_policy" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;br&gt;
  name = "${var.project_name}-kb-opensearch-policy-${each.key}-managed"&lt;br&gt;
  policy = jsonencode({&lt;br&gt;
    Version = "2012-10-17"&lt;br&gt;
    Statement = [&lt;br&gt;
      {&lt;br&gt;
        Effect = "Allow"&lt;br&gt;
        Action = [&lt;br&gt;
          "es:&lt;em&gt;",&lt;br&gt;
        ]&lt;br&gt;
        Resource = [&lt;br&gt;
          module.opensearch.domain_arn,&lt;br&gt;
          "${module.opensearch.domain_arn}/&lt;/em&gt;"&lt;br&gt;
        ]&lt;br&gt;
      }&lt;br&gt;
    ]&lt;br&gt;
  })&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "aws_iam_role_policy_attachment" "knowledge_base_opensearch" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;br&gt;
  role = aws_iam_role.knowledge_base_role[each.key].name&lt;br&gt;
  policy_arn = aws_iam_policy.knowledge_base_opensearch_policy[each.key].arn&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Next, in the &lt;code&gt;opensearch_users.tf&lt;/code&gt;, let's create:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;opensearch_role&lt;/code&gt;: with &lt;code&gt;cluster_permissions&lt;/code&gt; and &lt;code&gt;index_permissions&lt;/code&gt; for each index&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;locals&lt;/code&gt; with all the IAM Roles we created above&lt;/li&gt;
&lt;li&gt;and &lt;code&gt;opensearch_roles_mapping&lt;/code&gt; for each &lt;code&gt;opensearch_role.os_bedrock_roles&lt;/code&gt;, which we add to each &lt;code&gt;opensearch_rolevia backend_roles&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It looks something like this:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
...&lt;/p&gt;

&lt;h4&gt;
  
  
  BEDROCK
&lt;/h4&gt;

&lt;p&gt;resource "opensearch_role" "os_bedrock_roles" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;br&gt;
  role_name = "os_bedrock_${each.key}_role"&lt;br&gt;
  description = "Backend Bedrock KB ${each.key} role"&lt;/p&gt;

&lt;p&gt;cluster_permissions = [&lt;br&gt;
    "indices:data/read/msearch",&lt;br&gt;
    "indices:data/write/bulk*",&lt;br&gt;
    "indices:data/read/mget*"&lt;br&gt;
    ]&lt;/p&gt;

&lt;p&gt;index_permissions {&lt;br&gt;
    index_patterns = ["kraken-kb-index-${each.key}"]&lt;br&gt;
    allowed_actions = ["*"]&lt;br&gt;
  }&lt;/p&gt;

&lt;p&gt;depends_on = [module.opensearch]&lt;br&gt;
}&lt;/p&gt;

&lt;h1&gt;
  
  
  'aws_iam_role' is defined in iam.tf
&lt;/h1&gt;

&lt;p&gt;locals {&lt;br&gt;
  knowledge_base_role_arns = {&lt;br&gt;
    for env, role in aws_iam_role.knowledge_base_role :&lt;br&gt;
    env =&amp;gt; role.arn&lt;br&gt;
  }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;resource "opensearch_roles_mapping" "os_bedrock_role_mappings" {&lt;br&gt;
  for_each = toset(var.app_environments)&lt;br&gt;
  role_name = opensearch_role.os_bedrock_roles[each.key].role_name&lt;/p&gt;

&lt;p&gt;backend_roles = [&lt;br&gt;
    local.knowledge_base_role_arns[each.key]&lt;br&gt;
  ]&lt;/p&gt;

&lt;p&gt;depends_on = [module.opensearch]&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Actually, this is where I encountered Bedrock access errors, which forced me to add &lt;code&gt;cluster_permissions&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;The knowledge base storage configuration provided is invalid… Request failed: [security_exception] no permissions for [indices:data/read/msearch] and User [name=arn:aws:iam::492*&lt;strong&gt;148:role/kraken-kb-role-dev, backend_roles=[arn:aws:iam::492&lt;/strong&gt;*148:role/kraken-kb-role-dev], requestedTenant=null]&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Deploy, check:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/Uawlt07mlnOKzjEiZjbZ1KaVGwFo8M92rImBDiInh1w/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzlwNXUy/Zm5vd2R1enFpaXJt/MHU2LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/Uawlt07mlnOKzjEiZjbZ1KaVGwFo8M92rImBDiInh1w/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzLzlwNXUy/Zm5vd2R1enFpaXJt/MHU2LnBuZw" width="800" height="298"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Creating OpenSearch indexes
&lt;/h3&gt;

&lt;p&gt;The provider already exists, so we’ll take the  &lt;a href="https://registry.terraform.io/providers/opensearch-project/opensearch/latest/docs/resources/index" rel="noopener noreferrer"&gt;&lt;code&gt;opensearch_index&lt;/code&gt;&lt;/a&gt; resource.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;locals&lt;/code&gt;, we write the index template - I just took it from the developers from the old configuration:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;br&gt;
locals {&lt;br&gt;
  # 'atlas-kb-prod'&lt;br&gt;
  env_name = "${var.project_name}-${var.environment}"&lt;br&gt;
  # 'opensearch.prod.example.co'&lt;br&gt;
  os_custom_domain_name = "opensearch.${var.dns_zone}"&lt;/p&gt;

&lt;p&gt;# index mappings&lt;/p&gt;

&lt;p&gt;os_index_mappings = &amp;lt;&amp;lt;-EOF&lt;br&gt;
    {&lt;br&gt;
      "dynamic_templates": [&lt;br&gt;
        {&lt;br&gt;
          "strings": {&lt;br&gt;
            "match_mapping_type": "string",&lt;br&gt;
            "mapping": {&lt;br&gt;
              "fields": {&lt;br&gt;
                "keyword": {&lt;br&gt;
                  "ignore_above": 8192,&lt;br&gt;
                  "type": "keyword"&lt;br&gt;
                }&lt;br&gt;
              },&lt;br&gt;
              "type": "text"&lt;br&gt;
            }&lt;br&gt;
          }&lt;br&gt;
        }&lt;br&gt;
      ],&lt;br&gt;
      "properties": {&lt;br&gt;
        "bedrock-knowledge-base-default-vector": {&lt;br&gt;
          "type": "knn_vector",&lt;br&gt;
          "dimension": 1024,&lt;br&gt;
          "method": {&lt;br&gt;
            "name": "hnsw",&lt;br&gt;
            "engine": "faiss",&lt;br&gt;
            "parameters": {&lt;br&gt;
              "m": 16,&lt;br&gt;
              "ef_construction": 512&lt;br&gt;
            },&lt;br&gt;
            "space_type": "l2"&lt;br&gt;
          }&lt;br&gt;
        },&lt;br&gt;
        "AMAZON_BEDROCK_METADATA": {&lt;br&gt;
          "type": "text",&lt;br&gt;
          "index": false&lt;br&gt;
        },&lt;br&gt;
        "AMAZON_BEDROCK_TEXT_CHUNK": {&lt;br&gt;
          "type": "text",&lt;br&gt;
          "index": true&lt;br&gt;
        }&lt;br&gt;
      }&lt;br&gt;
    }&lt;br&gt;
EOF&lt;br&gt;
}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Create a file named &lt;code&gt;opensearch_indexes.tf&lt;/code&gt;. Add the indexes themselves - here, I decided not to use a loop, but to create separate Dev/Staging/Prod files directly:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;/code&gt;`&lt;/p&gt;

&lt;h1&gt;
  
  
  Dev Index
&lt;/h1&gt;

&lt;p&gt;resource "opensearch_index" "kb_vector_index_dev" {&lt;br&gt;
  name = "kraken-kb-index-dev"&lt;/p&gt;

&lt;p&gt;# enable approximate nearest neighbor search by setting index_knn to true&lt;br&gt;
  index_knn = true&lt;br&gt;
  index_knn_algo_param_ef_search = "512"&lt;br&gt;
  number_of_shards = "1"&lt;br&gt;
  number_of_replicas = "1"&lt;br&gt;
  mappings = local.os_index_mappings&lt;/p&gt;

&lt;p&gt;# When new documents are ingested into the Knowledge Base,&lt;br&gt;
  # OpenSearch automatically creates field mappings for new metadata fields under&lt;br&gt;
  # AMAZON_BEDROCK_METADATA. Since these fields are created outside of TF resource definitions,&lt;br&gt;
  # TF detects them as configuration drift and attempts to recreate the index to match its&lt;br&gt;
  # known state.&lt;br&gt;
  #&lt;br&gt;
  # This lifecycle rule prevents unnecessary index recreation by ignoring mapping changes&lt;br&gt;
  # that occur after initial deployment.&lt;br&gt;
  lifecycle {&lt;br&gt;
    ignore_changes = [mappings]&lt;br&gt;
  }&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;...&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Deploy, check:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://community.ops.io/images/9KzS2EN6zcoJkYdP8fcrXqFdAro-Bm9Bd3-DgAnfF4o/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3h1bWM5/YWl0N2Mwb2dvcTk4/Z3U3LnBuZw" class="article-body-image-wrapper"&gt;&lt;img src="https://community.ops.io/images/9KzS2EN6zcoJkYdP8fcrXqFdAro-Bm9Bd3-DgAnfF4o/rt:fit/w:800/g:sm/q:0/mb:500000/ar:1/aHR0cHM6Ly9jb21t/dW5pdHkub3BzLmlv/L3JlbW90ZWltYWdl/cy91cGxvYWRzL2Fy/dGljbGVzL3h1bWM5/YWl0N2Mwb2dvcTk4/Z3U3LnBuZw" width="800" height="378"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s basically it.&lt;/p&gt;

&lt;p&gt;Bedrock is already connected, everything is working.&lt;/p&gt;

&lt;p&gt;But it took a little bit of effort.&lt;/p&gt;

&lt;p&gt;And I’m sure it won’t be the last time :-)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at&lt;/em&gt; &lt;a href="https://rtfm.co.ua/en/terraform-creating-an-aws-opensearch-service-cluster-and-users/" rel="noopener noreferrer"&gt;&lt;em&gt;RTFM: Linux, DevOps, and system administration&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>devops</category>
      <category>aws</category>
      <category>terraform</category>
      <category>tutorials</category>
    </item>
  </channel>
</rss>
