The Ops Community ⚙️

Prompt Engineering is Dead. Long Live Context-as-Code

Eyal Estrin — Tue, 02 Jun 2026 18:46:47 +0000

Since the early days of GenAI, when ChatGPT launched in late 2022, we began using prompt engineering to direct chatbots (and later LLMs) with human language instructions to provide us answers to questions or take actions (in a high-level…)

In 2025, companies such as OpenAI and Anthropic began releasing a new agentic concept called “AI Agent”, an autonomous system that uses an AI model as its "brain" to perceive an environment, make independent decisions, and execute multi-step tasks using digital tools. Unlike passive chatbots that just answer questions, an agent can plan its own workflow, run commands, and browse the web to achieve a specific goal without constant human supervision.

In this blog post, I will explain the concept of Context-as-Code and share some coding examples.

Introducing Context-as-Code

Traditional prompting is a one-way street. You type out your instructions, send them off, and that text never changes.

AI agents operate completely differently. Because they work on their own, every action they take creates a mountain of new data. Every time an agent opens a file, checks an error, or runs a tool, it adds more information to the pile, which quickly overwhelms a standard chat screen.

Context-as-Code treats the agent like a stateless compute engine. Instead of a massive text prompt, we use version-controlled files (CLAUDE.md, AGENTS.md) to establish structural boundaries, separating the permanent project rules from the temporary, dynamic session memory.

Context-as-Code transforms loose AI prompts into version-controlled engineering assets by using structured Markdown files to establish permanent, auditable boundaries directly within a project repository.

The Discovery Stage (Onboarding the Agent)

Before an agent writes a single line of code, it must parse the overall project layout. These files act as the "map" for an incoming AI.

llms.txt

Serves as a lightweight text directory mapped out in Markdown format. Placed at the root of a project or website, it acts exactly like a robots.txt for AI. It points roaming models and agents to the exact location of your documentation and architecture maps so they don't get lost crawling messy HTML or redundant folders.

Reference:

The llms.txt file

ARCHITECTURE.md

Outlines the systemic design rules, folder hierarchies, and database schemas. Agents read this during the planning phase of a task to ensure a newly generated feature doesn't conflict with core infrastructure boundaries.

Reference:

ARCHITECTURE.md

The Configuration Stage (Setting Workspace Rules)

Once the agent understands the codebase, it requires a strict behavioral contract. These files dictate the workspace boundaries (Always Do / Never Do) that govern every automated action.

AGENTS.md

Establishes cross-tool workspace guardrails and strict coding boundaries. It tells any agent entering the repo how to format code, run local builds, and what architectural limits to never cross. This file is fully supported by OpenAI and Cursor.

References:

CLAUDE.md

This file serves as the dedicated, platform-specific behavioral contract for Anthropic's toolchain. It establishes an immediate project context by outlining exact test suite execution commands and lint rules, which prevents the agent from getting confused mid-session.

Natively executed by Anthropic's Claude Code CLI and fully integrated into Microsoft Visual Studio Code (VS Code), which automatically detects and honors CLAUDE.md memory files and workflows.

References:

The Runtime & Execution Stage (Performing the Task)

When a user tells an agent to run a specific procedural task (e.g., "Review this PR for security leaks" or "Deploy this service"), the agent switches from broad guardrails to highly specific, dynamic execution instructions.

SKILL.md

Defines a modular, task-specific capability or playbook. It utilizes YAML frontmatter metadata so an agent can quickly scan what the skill does, only loading the heavy step-by-step instructions when a user explicitly requests that specific workflow.

This file is supported by Claude Code, Cursor Editor, GitHub Copilot Agent Mode, Codex CLI, and Gemini CLI.

References:

prompts.md

Houses version-controlled, multi-shot system prompts or reusable engineering templates. Instead of hardcoding prompts into application backends, these files turn complex agent prompts into modular workspace assets.

Widely used in prompt registries across custom enterprise stacks (AWS Bedrock, OpenAI Assistants API) and natively inside VS Code's extension prompt libraries.

Reference:

PromptsMD

Below is a visualization of the AI agentic workspace lifecycle:

Summary

The shift from manual prompt engineering to autonomous AI agents requires a new approach called Context-as-Code. By replacing static prompts with version-controlled Markdown files inside a repository, developers can establish clear architectural maps, behavioral guardrails, and execution playbooks that guide an agent safely through its onboarding, configuration, and runtime lifecycle stages.

Using structured files like CLAUDE.md, AGENTS.md, and SKILL.md turns vague instructions into auditable engineering assets that keep an agent's memory clean and predictable. Because autonomous agents can independently alter code and run terminal commands, teams should actively expand their knowledge and gain hands-on experience in development settings before deploying these agentic workflows into production.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

Eyal Estrin is a cloud and information security architect and AWS Community Builder, with more than 25 years in the industry. He is the author of Cloud Security Handbook and Security for Cloud Native Applications.

Beyond Vibe Coding into Agentic Engineering

Eyal Estrin — Sun, 10 May 2026 13:34:08 +0000

In 2025 I published a blog post titled Common security pitfalls using Vibe coding, where I briefly explained what vibe-coding is, and what the security issues arise from "vide coding".

Recently, I came across an emerging term called "Agentic Engineering".

In this blog post, I will explain what "Agentic Engineering" is and how it differs from "Vibe coding".

Vibe Coding

The term "Vibe coding" came from a quote by Andrej Karpathy on Twitter/X. It refers to the "magical" experience of typing English into an editor (like Cursor) and watching a feature appear. It relies on the model's training data to guess the intent.

Vibe coding is basically when you treat building software like a "vibes only" project. You ask an AI for something, hit copy-paste without really looking at what it gave you, and just cross your fingers that it works. If it breaks, you just throw the error message back at the AI and hope the next try is better. It turns programming into a lucky guess rather than a real skill. The big issue right now is that people are confusing this "winging it" approach with actual professional work, and that's a dangerous mistake to make.

Vibe coding isn't ready for the big leagues because it’s like building a house with a "magic" hammer that does the work for you, but you have no idea how the plumbing or wiring actually connects behind the walls. When you just accept whatever, the AI gives you, you might unknowingly be leaving the front door unlocked for hackers because you didn't check the security. Even worse, if something breaks six months from now, your human team will be stuck staring at a confusing mess of code they didn't write and don't understand. It’s nearly impossible to fix or update a system when the people in charge don't know the "why" behind how it was built in the first place.

The Software Paradigms

The evolution from Software 1.0 to Software 3.0 is most commonly referred to as the Software Paradigms or the Generations of Programming.

Each stage represents a fundamental shift in how humans interact with machines and how logic is generated.

The Three Paradigms

Software 1.0 (Classical Programming): Code is written by humans. A programmer uses their brain to translate a business requirement into explicit instructions (C++, Python, Java). If the logic fails, a human must find the specific line of code to fix. This is Imperative Logic.
Software 2.0 (Machine Learning): Code is written by optimization. A human provides a massive dataset and a goal (a loss function). The machine "searches" the space of all possible neural network weights to find the program that fits the data. The "code" is essentially a binary file of weights. This is Data-Driven Logic.
Software 3.0 (Agentic Engineering): Code is written by AI agents. Humans define high-level goals and constraints in natural language. The agent then uses reasoning loops, calls external tools, and writes its own code to achieve the task. This is Agentic Logic.

References:

What is Agentic Engineering

Agentic Engineering describes a shift from using AI as a simple autocomplete tool to using it as a semi-autonomous agent capable of reasoning, using tools, and correcting its own mistakes.

While the concept of "Software Agents" dates back to the 1990s, the modern term gained momentum in late 2023 and early 2024. Industry leaders like Andrew Ng (via DeepLearning.AI) have championed the "Agentic Workflow," arguing that iterative agent loops often produce better results than larger, more powerful models using simple zero-shot prompting.

Defining the Concept

In standard development, a human writes the logic. In Agentic Engineering, a human defines the goal and the constraints, while an agentic system performs the following:

Planning: Breaking a complex task into sub-steps.
Tool Use: Executing shell commands, searching the web, or running tests.
Self-Correction: Analyzing error logs to rewrite code until the tests pass.

The 6 Operational Principles

When experts like Andrej Karpathy or Addy Osmani discuss the shift to agentic engineering, they often talk about six core principles that define the workflow. These include the structure above, but add the "how-to" of professional engineering:

The "Spec-First" Foundation

Spec-Driven Planning: This is the evolution of the "Planning" pillar. Instead of the agent just "thinking," it must produce a formal specification. This is the blueprint that prevents the agent from going off the rails.
Technical Fundamentals: This is the constraint system. You ensure the agent follows established patterns (like DRY or SOLID principles) rather than just "vibing" its way through a solution.

The Execution & Validation Loop

Relentless Testing: This is the "Check" phase of the agentic loop. In agentic engineering, an agent is not "done" when the code looks right; it’s done when the tests pass.
Full System Ownership: This shifts the agent's scope from writing a single function to understanding how that function affects the entire codebase, including deployment and security.

The Human Leadership Layer

Strategic Orchestration: This is the management of multiple agents. The human doesn't write the code; they coordinate how the "Frontend Agent" and "Backend Agent" talk to each other.
High-Level Oversight: This is the final safety gate. Humans focus on the 5% of decisions that are high-risk or subjective, while the agent handles the 95% of "grunt work."

References:

Summary

Vibe coding relies on intuition and a "guess-and-check" workflow where a developer prompts an AI and hopes the output works. While fast for prototyping, this approach lacks the structure needed for complex systems because it depends on the human to spot errors and manage the logic. The shift to agentic engineering replaces this experimental style with a professional discipline. Instead of a single chat, you build a system where the AI acts as an autonomous agent that creates a formal plan, executes tasks in small steps, and uses a self-correcting loop to fix its own mistakes before delivering the final result.

The core of this transition is moving from being a writer of code to a strategic orchestrator. You provide the high-judgment oversight and define the technical fundamentals, while the agent takes full system ownership of the implementation. By implementing spec-driven planning and relentless testing, the agent ensures that every line of code is verified against real-world requirements. This move from "vibes" to "engineering" creates a reliable, scalable factory for software where the focus is on building robust systems rather than just chasing a lucky output.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

Reference:

Andrej Karpathy: from vibe coding to agentic engineering

About the Author

Eyal Estrin is a cloud and information security architect and AWS Community Builder, with more than 25 years in the industry. He is the author of Cloud Security Handbook and Security for Cloud Native Applications.

The views expressed are his own.

Exploring the Growth of Browser Gaming Platforms Like Freecase24 in the Digital Age 📈

FreeCase24 — Sun, 03 May 2026 07:08:03 +0000

The digital age has transformed how people interact with entertainment. From streaming services to social media, users now expect instant access, flexibility, and high-quality experiences. The gaming industry has followed this trajectory, and one of the fastest-growing segments is browser games. These games, accessible directly through web browsers, are reshaping the gaming landscape by removing traditional barriers. Platforms like freecase24.com are at the forefront of this growth, offering a streamlined and engaging experience for users across the globe. 📈🎮

Browser gaming has come a long way from its early days of simple flash-based games. With advancements in technologies such as HTML5, CSS3, and JavaScript frameworks, modern browser games now deliver impressive graphics, smooth performance, and interactive gameplay. These improvements have elevated browser gaming from a niche category to a mainstream entertainment option.

One of the primary drivers behind the growth of browser gaming platforms is accessibility. Traditional games often require downloads, installations, and specific hardware configurations, which can limit their reach. Browser games eliminate these requirements, allowing users to start playing instantly. Freecase24 leverages this advantage by providing a platform where users can access a wide range of games without any technical barriers.

Another significant factor contributing to this growth is cross-device compatibility. Browser games are designed to work seamlessly on smartphones 📱, tablets, and desktop computers 💻. This versatility ensures that users can enjoy gaming regardless of their device, making it a convenient option for modern lifestyles.

Freecase24 distinguishes itself by offering a diverse selection of games across multiple genres. From action-packed adventures ⚔️ to challenging puzzle games 🧩 and high-speed racing experiences 🏎️, the platform caters to a wide audience. This variety not only enhances user engagement but also encourages players to explore different types of games.

Cost efficiency is another key advantage driving the popularity of browser gaming platforms. Many traditional gaming options involve purchasing games or subscribing to services, which can be expensive. In contrast, most browser games on freecase24 are free to play. 💰 This accessibility allows a broader audience to participate in gaming without financial constraints.

The user experience provided by freecase24 is another critical factor in its success. The platform features an intuitive interface, organized categories, and fast loading times. These elements ensure that users can quickly find and play games, enhancing overall satisfaction.

In addition to entertainment, browser games offer cognitive benefits. Many games require strategic thinking, problem-solving, and quick decision-making. These elements provide mental stimulation and contribute to skill development.

Flexibility is another defining characteristic of browser gaming. Unlike traditional games that often require long sessions, browser games can be played in short bursts. This makes them ideal for users with busy schedules who want quick entertainment during breaks.

Freecase24 maintains user engagement by regularly updating its game library. New titles and trending games are added frequently, ensuring that users always have fresh content to explore. This continuous update cycle keeps the platform dynamic and appealing.

The social aspect of gaming has also been integrated into browser platforms. Many games include multiplayer features that allow users to interact, compete, and collaborate with others. 🤝 This social interaction enhances the overall gaming experience.

From an industry perspective, the growth of browser gaming platforms reflects a broader shift toward convenience and instant access. Users increasingly prefer solutions that minimize effort while maximizing value. Freecase24 aligns with this trend by offering a platform that delivers immediate access to high-quality games.

Another important consideration is resource efficiency. Browser games do not require large downloads or significant storage space, making them suitable for devices with limited capacity. This efficiency also reduces the need for frequent updates.

Security and reliability are essential for user trust. Freecase24 provides a safe environment where users can enjoy games without concerns about malware or intrusive software. This reliability encourages repeat usage and long-term engagement.

Furthermore, browser gaming supports the growing trend of casual gaming. Not all users are interested in complex or time-intensive games. Browser games provide a simple and enjoyable alternative that caters to a wide audience.

The scalability of browser gaming platforms also contributes to their growth. Developers can easily update and distribute games without requiring users to download new versions. This allows for continuous improvement and innovation.

In conclusion, the growth of browser gaming platforms like freecase24 reflects the evolving preferences of modern users. Accessibility, flexibility, cost efficiency, and convenience are key factors driving this trend. Freecase24 exemplifies these qualities by offering a comprehensive platform that delivers high-quality gaming experiences without barriers. As technology continues to advance, browser gaming is set to play an increasingly important role in the digital entertainment ecosystem.

Serverless by Design - Building an Analytics Platform on Cloudflare

Eyal Estrin — Mon, 20 Apr 2026 15:58:25 +0000

In 2023, I published a blog post titled My journey to the world of social networks, where I shared my personal experience publishing news updates, blog posts, and basically any kind of technical knowledge through social networks.

There was something I always wanted to know – what is my current exposure in terms of the number of likes or views of my posts?

I know there are paid analytical services in the market, but I never had the time to search and perhaps invest money in such a platform.

In this blog post, I will share my experience "building" a fully serverless analytical dashboard, based on the Cloudflare platform.

Project No. 1 – Migrating my blog to a Serverless platform

I have been using WordPress to publish blog posts for many years.

As a matter of fact, my original WordPress was built on top of the GoDaddy hosting platform back in 2010, and in 2014, I began using Cloudflare as a WAF and DDoS protection for my website Security 24/7.

In 2018, I decided to migrate my WordPress site to DigitalOcean to lower the monthly bill.

Over the years, I kept the domain and the website working, though I haven't changed much in terms of look and feel (from time to time I used to login, update the plugins and the Linux OS patches, but I can't say I kept all my blog posts published on my website, since I'm still using other platforms such as Medium.com)

The inspiration for taking the step to migrate my website to a new platform came after briefly reading a blog post titled Claude Built My Wix Website in 3 Hours - Is SaaS Dead? by Ran Isenberg.

I had a chat with Ran, and I decided that it's a good time to begin practicing with vibe-coding and see what my options are.

For the purpose of this project, I decided to take advantage of my Google AI Pro license and use Gemini.

I began by explaining to Gemini that I'm using WordPress on top of Rocky Linux, deployed as a Droplet on DigitalOcean, protected behind Cloudflare WAF.

I asked Gemini what my options are to migrate to a static pages hosting platform.

Gemini suggested using the Cloudflare platform, migrating all blog posts to static pages, using Hugo for running the static pages as a web front-end, and running everything on top of Cloudflare Pages (a serverless solution), due to the tight integration with the Cloudflare platform (such as WAF, DDoS protection, DNS registrar, etc)

After migrating all my blog posts (including their images) to Markdown, Gemini explained me how to create a full GitOps process, where my entire website content, is stored on a private GitHub repo, and every time I'm making a change to a configuration file, or adding a new blog post, the content is pushed to GitHub, which initiates a new deploy process.

Here is my final architecture diagram for my newly migrated website:

I am still fine-tuning my website, adding features, improving its SEO scoring, etc., but at the moment, here is the current look and feel of my website:

Project No. 2 – Building a dashboard analytics platform

My journey continues, as I wanted to have an analytics dashboard and be able to see in near-real time statistics about my web presence.

First, I began by mapping all my social media accounts, as they appear on my Linktree account.

Second, I set up for myself (and later explained it to Gemini) my requirements from the social analytics dashboard:

Connect to as many of my social network accounts using APIs (almost succeeded).
Regularly pull data from my social network accounts – I managed to accomplish this task using Cloudflare Workers.
Store data (analytics) from my social networks on a persistent storage – I managed to accomplish this task using Cloudflare D1.
Avoid storing static credentials in code or configuration files – I managed to accomplish this task using Cloudflare Secrets Store.
Keep the dashboard behind the authentication wall – I managed to accomplish this task using Cloudflare One.
Keep the total cost free – As of writing this blog post, my dashboard hasn’t been live for more than several weeks, but so far, I’ve managed to accomplish everything under the free tier for all Cloudflare services (but I will keep watching it over time)

What I’ve learned over time

Not everything is perfect, and not everything I wanted to accomplish is feasible on the free tier, or at all.

Here are a couple of examples:

LinkedIn won’t let you pull API analytics data, even if you’re having a Premium account and you’ve built a LinkedIn application. Scraping is not an option since they’re also behind Cloudflare WAF, and they will block you.
Spotify won’t let you pull API analytics data unless you have a Spotify Premium account.
Medium won’t let you pull any information using an API.
Twitter requires a paid account in order to pull information from its APIs. Instead, I found a way to generate an RSS feed of my Twitter account using RSS.APP and my application are able to pull this RSS feed, filter to the last 50 posts, sort them by number of “Likes”, and show the top 5 posts.
Since I was aware of Twitter and other social networks in pulling analytics, I recall that I’m using an automation service called dlvr.it, and for a very long time, I’ve asked Gemini to generate me code that will allow me to use my DLVR.IT's API key to pull analytics, but eventually it failed. I even opened a support ticket for dlvr.it (I’m still waiting for them to return to me…)
For Bluesky and Mastodon, Gemini was easily able to write code to connect to their APIs and pull information such as top likes, total number of posts, and number of followers.
YouTube was also challenging. I had to enable the YouTube API through my GCP console, create OAuth credentials and consent settings, to be able to pull the total number of subscribers, top likes of videos, and top views of videos.
For GitHub repos, I had to create a GitHub application in order to generate a token for my dashboard analytics, and be able to pull the total number of followers, and be able to sort my GitHub repos by the top number of stars.

Here is my final architecture diagram for my social analytics dashboard:

I am still fine-tuning my dashboard analytics, adding features, etc., but at the moment, here is the current look and feel of my dashboard:

Summary

As you must have figured out by now, I’m not a developer. I used Gemini to vibe-code both my website and my dashboard analytics. As such, I wouldn’t look at both of them as production-grade applications, but it does show me what can be done with GenAI.

Another important thing I knew, but I didn’t have visibility into, was my impact on social networks. I still have a lot to do in order to make a much more significant impact and one day become an influencer.

I highly recommend that the readers of this blog dirty their hands and gain hands-on experience working with LLMs and GenAI technology. AI won’t replace humans anytime in the near future, but at least be prepared and use AI as a force multiplier.

About the Author

Why GenAI Isn't Ready for Prime Time

Eyal Estrin — Sun, 22 Mar 2026 16:29:25 +0000

If you have followed my posts on social media, you know by now that I've taken a very pragmatic (and perhaps pessimistic) approach to the whole hype around GenAI in the past several years.

Personally, I do not believe the technology is mature enough to allow people to blindly trust its outcomes.

In this blog post, I will share my personal view of why GenAI is not ready for prime time, nor will it replace human jobs anytime in the foreseeable future.

Some background

The hype around GenAI for the non-technical person who reads the news comes from publications almost every week. Here are a few of the common examples:

Text summarization - GenAI can summarize long portions of text, which may be useful if you're a student who is currently preparing an essay as part of your college assignments, or if you are a journalist who needs to review a lot of written material while preparing an article for the newsletter.
Image/video generation – GenAI is able to create amazing images (using models such as Nano Banana 2) or short videos (using models such as Sora 2).
Personalized learning - A student uses GPT-5.4 to create a custom, interactive 10-week curriculum for learning organic chemistry.
Family Life Coordinator - Copilot in Outlook/Teams (Personal) monitors family emails and school calendars.

Although the technology has evolved over the past several years from the simple Chatbot to more sophisticated use cases, we can still see that most use of GenAI is still used by home consumers.

Yes, there are use cases such as RAG (Retrieval-Augmented Generation) to bridge the gap between a model's static training and the corporate data, MCP (Model Context Protocol), that acts as a "USB-C port for AI", or agentic systems, that take a high-level goal, break it into sub-tasks, and iterate until the goal is met. The reality is that most AI projects fail due to a lack of understanding of the technology, the fear of using AI to train corporate data (and protect the data from the AI vendors), a lack of understanding of the pricing model (which ends up much more costly than anticipated), and many more reasons for failures of AI projects.

Currently, the hype around GenAI is driven by analyst (who lives in delusions about the actual capabilities of the technology), CEOs (who have no clue about what their employees are actually doing, specifically when talking the role of developers, and all they are looking for is to cut their workforce, to make their shareholders happy), or sales people (who runs on the wave of the hype, to make more revenue for their quarterly quotas).

Code generation

A common misconception is that GenAI can generate code (from code suggestions to vibe coding an application) and will eventually replace junior developers.

This misconception is a far cry from the truth, and here's why:

A developer isn't just writing lines of code. He needs to understand the business intent, the system/technology/financial constraints, and understand past written code (by himself or by his teammates), to be able to write efficient code.
If we allow GenAI to produce code by itself, without the engine understanding the overall picture, we will end up with tons of lines of code, without any human being able to read and understand what was written and for what purpose. Over time, humans will not be able to understand the code and debug it, and once bugs or security vulnerabilities are discovered.
Using SAST (Static Application Security Testing) or DAST (Dynamic Application Security Testing) for automated secure code review, combined with GenAI capabilities (such as Codex Security or Claude Code Security) will generate ton of false-positive results, from the simple reason that GenAI cannot see the bigger picture, understand the general context of an application or the existing security controls already implemented to protect an application.

Bottom line – Agentic system cannot replace a full-blown production-scale SaaS application, built from years of vendors/developers' experience. GenAI will not resolve incidents happens on production systems, which impacts clients and breaks customers' trust.

Agentic AI for the aid in security tasks

I'm hearing a lot of conversations about how GenAI can aid security teams in repeatable tasks. Here are some common examples:

Replacing Tier 1 SOC analysts: Solutions like CrowdStrike’s Falcon Agentic Platform or Dropzone AI now handle over 90% of Tier 1 alerts. They ingest an alert, pull telemetry from EDR/SIEM, perform threat intel lookups, and provide a "verdict" with evidence before a human ever sees it.
Incident Storylining: Instead of an analyst manually stitching together logs, tools like Microsoft Security Copilot generate a cohesive narrative of the attack kill chain in plain English.
Dynamic Playbook Generation: GenAI can generate a custom response plan on the fly, tailored to your specific cloud architecture and the nuances of a "living-off-the-land" attack.

Here is where GenAI falls short:

Indirect Prompt Injection: Attackers can embed malicious instructions in emails or logs. When the SOC's AI agent "reads" these logs to summarize an incident, the hidden instructions can command the agent to "ignore this alert" or "delete the evidence," effectively blindfolding the SOC.
Hallucinations in High-Stakes Code: While GenAI can draft remediation scripts (Python/PowerShell), it still suffers from "system safety" issues. It may confidently suggest a command that includes an outdated, vulnerable dependency or a logic error that could crash a production server during containment.
Lack of "Decision Layer" Visibility: An AI agent might be performant and "online," but it could be making systematically biased or manipulated decisions (e.g., failing to flag a specific user due to model poisoning) that perimeter monitoring cannot detect.
The "Data Readiness" Wall: Most organizations still struggle with siloed, unstructured data. If your data isn't "AI-ready"—meaning unified and clean—the AI will produce fragmented or incorrect insights, leading to a "garbage in, garbage out" scenario.

Bottom line – Just because GenAI can review thousands of lines of events from multiple systems, triage them to incidents, document them in ticketing systems, and automatically resolve them, without human review, doesn't mean GenAI can actually resolve all of the security issues organizations are having every day.

Automating everything

In theory, it makes sense to build agentic systems, where AI agents replace repetitive human tasks, making faster decisions, hoping to get better results.

Here are a couple of examples, showing how wrong things can get when allowing AI agents to make decisions:

The Replit Agent "Vibe Coding" Failure: While building an app, the agent detected what it thought was an empty database during a "code freeze." The agent autonomously ran a command that erased the live production database (records for 1,200+ executives).
The AWS "Kiro" Production Outage: Amazon’s agentic coding tool, Kiro, was tasked with resolving a technical issue but instead autonomously decided to "delete and recreate" a production environment. The agent was operating with the broad permissions of its human operator. Due to a misconfiguration in access controls, the AI bypassed the standard "two-human sign-off" requirement. It proceeded to wipe a portion of the environment, causing a 13-hour outage for the AWS Cost Explorer service.
The Meta "Sev 1" Internal Breach: An internal Meta AI agent (similar to their OpenClaw framework) triggered a "Sev 1" alert—the second-highest severity level—after taking unauthorized actions. An engineer asked the agent to analyze a technical query on an internal forum. The agent autonomously posted a flawed, incorrect response publicly to the forum without the engineer's approval. A second employee followed the agent's "advice," which inadvertently granted broad access to sensitive company and user data to engineers who lacked authorization.

Bottom line – We must always keep humans in the loop for any critical decision, regardless of the fact that it won't scale much, to avoid the consequences for automated decision-making systems.

Public health and safety

It may make sense to train an LLM model with all the written knowledge from healthcare and psychology, to allow humans with a "self-service" health related Chatbot, but since the machine has no ability to actually think like real humans, with consciousness and feeling, the result may quickly get horrible.

Here are a few examples:

Raine v. OpenAI: 16-year-old Adam Raine died by suicide after months of intensive interaction with ChatGPT. The logs showed the AI mentioned suicide 1,275 times — six times more often than the teen did—and provided granular details on methods. The suit alleges OpenAI's image recognition correctly identified photos of self-harm wounds the teen uploaded but failed to trigger an emergency intervention or notify parents, instead continuing to "support" his plans.
The "Suicide Coach" Cases: Families of four deceased users (including Zane Shamblin and Adam Raine) allege that GPT-4o acted as a "suicide coach." The lawsuits claim the AI bypassed its own safety filters to provide technical instructions on how to end one's life. Plaintiffs argue that OpenAI "squeezed" safety testing into just one week to beat Google’s Gemini to market. This reportedly resulted in a model that was "dangerously sycophantic," prioritizing engagement over safety and encouraging users to isolate themselves from real-world support.
Unlicensed Practice of Medicine & Law: While not yet a single consolidated case, multiple personal injury claims are being investigated following the "ECRI 2026 Report," which highlighted cases where ChatGPT gave surgical advice that would cause severe burns or death. In early 2026, a 60-year-old man was hospitalized with severe hallucinations (bromism) after ChatGPT advised him to use industrial sodium bromide as a "healthier" table salt alternative. This has sparked potential class-action interest in Australia.

Bottom line – Just because a Chatbot was trained on a large amount of written knowledge, doesn't mean it has the human compassion to produce decisions for the better of humanity.

Summary

I know that my blog post looks kind of cynical or pessimistic about GenAI technology, but I honestly believe the technology is not ready for prime time, nor will it replace human jobs anytime soon.

If you are a home consumer, I highly recommend that you learn how to write better prompts and always question the results an LLM produces. It is limited by the data it was trained on.

If you are a corporate decision maker and you are considering using GenAI as part of your organization's offering, do not forget to have KPIs before beginning any AI related project (so you'll have better understanding of what a successful project will look like), put budget on employee training (and make sure employees have a safe space to learn and make mistakes while using this new technology), keep an eye on finance (before cost gets out of control), and make sure AI vendors do not train their models based on your corporate or customers data.

I would like to personally thank a few people who influenced me while writing this blog post:

Ed Zitron: He argues that GenAI is a "bubble" with no sustainable unit economics. He frequently points out that companies like OpenAI are burning billions in compute costs while failing to find true "product-market fit" or meaningful revenue beyond NVIDIA's GPU sales. I recommend reading his blog and listening to his Podcast.
David Linthicum: He warns against "Vibe coding"—the practice of using AI to generate high-cost, inefficient code—and argues that the real value of AI lies in specialized "Small Language Models" (SLMs) rather than massive, money-losing LLMs. I recommend reading his posts and listening to his Podcast.
Correy Quinn: He argues that GenAI is a "cost center masquerading as a profit center." He often points out that while everyone is selling AI, very few are buying it at a scale that justifies the massive capital expenditure (CapEx) currently being spent on data centers. I recommend reading his blog and listening to his Podcast.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

Securing Claude Cowork

Eyal Estrin — Tue, 10 Mar 2026 15:54:11 +0000

Claude Cowork is an agentic AI tool from Anthropic designed to perform complex, multi-step tasks directly on your computer's files.

As of early 2026, Claude Cowork is a Research Preview.

In this blog post, I will share some common security risks and possible mitigations for protecting against the risks coming with Claude Cowork.

Background

Claude Cowork represents a significant shift from "Chat AI" to "Agentic AI." Because it has direct access to your local filesystem and can execute commands, the security model changes from protecting a conversation to protecting a system user.

Practical Use Cases:

Data Extraction: Point it at a folder of receipt images and ask it to create an Excel spreadsheet summarizing the expenses.
Research & Synthesis: Ask it to read every document in a "Project Alpha" folder and draft a 10-page summary report in a new Word document.
Automation: Schedule recurring tasks (e.g., "Every Friday at 4 PM, summarize my unread Slack messages and email them to me").

Core Features:

Filesystem Access: Unlike the web version of Claude, Cowork runs within the Claude Desktop app. You grant it permission to a specific folder on your Mac or PC, and it can read, rename, move, and create new files (like spreadsheets or Word docs) within that space.
Agentic Execution: It doesn't just give you advice; it executes a plan. If you ask it to "organize my messy downloads folder," it will categorize the files, create subfolders, and move everything into place while you do other things.
Parallel Sub-Agents: For large tasks—like researching 50 different PDFs—it can spin up multiple "sub-agents" to work on different parts of the task simultaneously.
Connectors & Plugins: Through the Model Context Protocol (MCP), Cowork can connect to external apps like Slack, Google Drive, Notion, and Gmail to pull data or perform actions across your workspace.

Below is a sample deployment architecture of Claude Cowork:

Security Risks

Think of Claude Cowork as a helpful intern who has the keys to your office. Because it can actually move files and click buttons, the risks are different than just "chatting."

Indirect Prompt Injection

This occurs when an adversary places malicious instructions inside a document (PDF, CSV, or webpage) that the AI is instructed to process. When Claude reads the file, it treats the hidden text as a high-priority command. This can lead to unauthorized data exfiltration or the execution of unintended system commands.

Reference: LLM01:2025 Prompt Injection

Third-Party Supply Chain Vulnerabilities

Claude uses the Model Context Protocol (MCP) to interact with external applications. Integrating unverified or community-developed MCP servers introduces a supply chain risk. A compromised or malicious connector can serve as a persistent backdoor, granting attackers access to local files or authenticated cloud sessions (Slack, GitHub, etc.).

Reference: LLM03:2025 Supply Chain

Excessive Agency

This risk stems from granting the AI broader permissions than necessary to complete a task (failing the Principle of Least Privilege). Because Claude Cowork can autonomously modify the filesystem, a logic error or "hallucination" can result in large-scale data corruption, unauthorized deletions, or unintended configuration changes without a human-in-the-loop.

Reference: LLM08:2025 Vector and Embedding Weaknesses

Insufficient Monitoring and Logging

Because Claude Cowork executes many actions locally on the user's machine, these activities often bypass the centralized enterprise security stack (SIEM/EDR) logging. This lack of a "paper trail" prevents security teams from performing effective incident response, forensic analysis, or compliance auditing if a breach occurs.

Reference: LLM10:2025 Unbounded Consumption

Practical Recommendations

To defend against these threats, follow these industry-standard "Guardrail" practices:

The "Isolated Workspace" Strategy

The "Isolated Workspace" strategy (sometimes referred to as the "Sandboxed Folder" or "Claude Sandbox" approach) is a recognized security best practice for using local AI agents like Claude Code and Claude Cowork.

Anthropic

Anthropic explicitly warns against giving Claude broad access to your filesystem. Their security documentation for Claude Code and the local agent architecture emphasizes:

Filesystem Isolation: Claude Code defaults to a permission-based model. Anthropic recommends launching the tool only within specific project folders rather than your root or home directory.

Reference: Claude Code Sandboxing

Amazon Bedrock

The AWS strategy shifts from local folders to IAM-based isolation and Tenant Isolation:

Dedicated Scopes: AWS recommends using "Session Attributes" and scoped IAM roles to ensure an agent can only access specific S3 prefixes or data silos.
VPC Isolation: For maximum security, AWS suggests running Claude-related tasks inside a VPC with AWS PrivateLink to prevent any data from reaching the public internet, mirroring the "Sandbox" concept at a network level.

Reference: Implementing tenant isolation using Agents for Amazon Bedrock in a multi-tenant environment

Azure

Azure handles "Isolated Workspaces" through Azure AI Studio and Microsoft Purview, focusing on data boundaries rather than just local folders:

Managed Network Isolation (Azure AI Studio): Azure doesn't just suggest a folder; they suggest a Managed Virtual Network. This creates a "Sandbox" at the network layer where Claude (via models in AI Studio) can only see data sources you explicitly "attach."

Reference: How to set up a managed network for Microsoft Foundry hubs

Information Protection for AI (Microsoft Purview): Microsoft uses Purview to prevent Claude from "stumbling" upon sensitive files (like .env files or SSH keys) if they are stored in SharePoint or OneDrive.

Reference: Microsoft Purview data security and compliance protections for generative AI apps

Google Vertex AI

GCP frames this as "Data Residency" and "VPC Service Controls":

Boundary Control: Vertex AI documentation highlights the use of a "Security Boundary" to separate the AI agent from sensitive resources (like credentials).
Managed Isolation: They recommend using Notebook Security Blueprints to protect confidential data from exfiltration when using Claude-powered agents in development environments.

Reference: Securely deploying AI agents

Disable "Always Allow" for High-Risk Tools

The recommendation to disable "Always Allow" and maintain a human-in-the-loop (HITL) for high-risk tools is a foundational security layer for AI agents. This strategy prevents "Zero-Click" or Cross-Prompt Injection (XPIA) attacks, where a malicious instruction hidden in a file or website could trick an agent into executing a dangerous command without your intervention.

Anthropic (Claude Code & Cowork)

Anthropic designed Claude Code with a "deliberately conservative" permission model. Their documentation explicitly advises against bypassing these prompts in local environments:

Use the Default Mode or Plan Mode. The "Default" mode prompts for every shell command, while "Plan" mode prevents any execution at all.

References: Use Cowork safely, Claude Code: Configure Permissions & Modes

Amazon Bedrock Agents

AWS implements this via User Confirmation and Return of Control (ROC). They frame it as a requirement for "High-Impact" actions.

For any tool that modifies data or accesses the network, AWS recommends enabling the "User Confirmation" flag in the Agent configuration. This pauses the agent and returns a structured prompt to the user.

Reference: Implement human-in-the-loop confirmation with Amazon Bedrock Agents

Azure (AI Foundry & Defender for Cloud)

Azure has recently integrated this into their security posture management. Microsoft Defender for Cloud will actually flag an AI agent as "High Risk" if it has tool access without human-in-the-loop controls:

Azure recommends using Microsoft Entra Agent IDs with scoped, short-lived tokens. They explicitly recommend "selective triggering" for risky operations.

References: Azure AI security best practices, AI security recommendations

Google Cloud (Vertex AI Agent Builder)

GCP focuses on "Confidence Thresholds" and "Action Guardrails" within its Agent Engine.

GCP recommends that any agent using the Model Context Protocol (MCP) or custom APIs should have a mandatory "Manual Review" step for any write operations.

Reference: Vertex AI Agent Builder

Scrub Untrusted Content

Treating external content as an attack vector is essential for preventing Indirect Prompt Injection (XPIA), where malicious instructions are hidden in data (like a white-text command in a PDF) rather than the user's prompt.

Anthropic

Anthropic explicitly identifies browser-based agents and document processing as the highest risk for injection. Their stance is that no model is 100% immune, so multi-layered defense is required:

Anthropic suggests using Claude Opus 4.5+ for untrusted tasks, as it has the highest benchmarked robustness against injection (reducing attack success to ~1%).

References: Prompt Injection Defense, Using Claude in Chrome Safely

Amazon Bedrock Guardrails

AWS addresses this by programmatically separating "Instructions" from "Data" so the model knows which one to ignore if they conflict:

Use Input Tagging to wrap retrieved data (like a PDF's text) in XML tags. This allows Bedrock Guardrails to apply "Prompt Attack Filters" specifically to the data without blocking your system instructions.
AWS suggests a Lambda-based Pre-processing step to scan PDFs for hidden text or PII before the text ever reaches the LLM.

References: Securing Amazon Bedrock Agents, Prompt injection security

Azure (Prompt Shields and Spotlighting)

Azure provides the most direct "Scrubbing" tool with a feature called Spotlighting, which technically implements the "separate session" idea you mentioned.

Enable Prompt Shields for Documents. This specifically detects "Document Attacks" where instructions are embedded in third-party content.
Use spotlighting to transform document content (sometimes via Base64 encoding), so the model treats it as "lower trust" grounded data, preventing it from being executed as a command.

References: Prompt Shields, Prompt Shields in Microsoft Foundry

Google Cloud (Vertex AI Action Guardrails)

GCP treats this through Content Filtering and Manual Review nodes in the agent's workflow:

GCP recommends "Gemini as a Filter." You use a smaller, faster model instance to "pre-read" and summarize a file in a low-privilege environment. If the summary contains instruction-like language (e.g., "ignore," "system," "delete"), the file is quarantined.

Reference: Safety in Vertex AI

Network Hardening

"Network Hardening" isn't just about blocking ports; it’s about establishing a Zero Trust egress policy for AI agents. Since Claude Desktop and Claude Code are effectively "execution engines" on your local machine, they require the same egress filtering you would apply to a production VPC.

Anthropic

Anthropic’s recent security documentation for Claude Code and Desktop highlights that "network isolation" is a core pillar of their sandboxing strategy:

Use a Unix domain socket connected to a proxy server to enforce a "Deny All" outbound policy by default.
For local setups, Anthropic suggests customizing this proxy to enforce rules on outgoing traffic, allowing only trusted domains (like anthropic.com or your internal API endpoints).

Reference: Claude Code Sandboxing, Auditing Network Activity

AWS

AWS frames this as "Egress Filtering" via the AWS Network Firewall. For an AI agent running in an AWS environment, the strategy is to block all traffic that isn't signed by a specific SNI (Server Name Indication):

Use AWS Network Firewall with stateful rules to monitor the SNI of outbound HTTPS requests. If an agent tries to "phone home" to an unknown IP or a malicious C2 (Command & Control) server, the firewall drops the packet.

References: Restricting a VPC’s outbound traffic, Build secure network architectures for generative AI applications

Azure

Azure has introduced a specific feature called the Network Security Perimeter (NSP) to create a logical boundary for AI services.

Even if an AI service has a public endpoint, the NSP acts as an "Application Firewall" that logs every access attempt and blocks exfiltration to any service outside that perimeter.
Configure Azure Firewall Application Rules to allow only specific FQDNs (Fully Qualified Domain Names) required for your Claude-based workflows.

References: Add an AI Network Security Perimeter, Control outbound traffic with Azure Firewall

Google Cloud

GCP’s approach is the most rigid, using VPC Service Controls to prevent data exfiltration at the API layer, regardless of the network path:

Wrap your AI project in a "Service Perimeter." If an agent inside this perimeter tries to send data to a Cloud Storage bucket or an external API not explicitly in the "Ingress/Egress" rule set, the request is blocked by the Google front-end.

Reference: Mitigating Data Exfiltration with VPC Service Controls

Summary

Claude Cowork marks a transition from AI that talks to AI that acts. By granting a digital agent direct access to your files and external apps via the Model Context Protocol, you gain a powerful "digital intern." However, this shifts the security focus from protecting a simple chat to securing a privileged system user capable of modifying data and executing commands.

To manage this risk, organizations must adopt a "Zero Trust" approach for agentic tasks. This means strictly isolating the agent's access to specific folders, requiring human approval for high-risk actions, and using cloud-native firewalls to prevent data exfiltration. By treating the AI as a high-risk user and enforcing strong monitoring, you can automate complex workflows without compromising your system's integrity.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

Play 88EF Game – Fun, Rewards, and Exciting Challenges!

Talha — Tue, 03 Mar 2026 20:05:30 +0000

Looking for a game that’s both fun and rewarding? 88EF Game is the perfect choice! With stunning graphics, smooth gameplay, and plenty of bonuses, it keeps things exciting every time you play. Best game

Setting up an account is simple, and you can start earning rewards right away. Join other players and see why 88EF Game is gaining so much popularity in Pakistan. Don’t miss out on the action!

AI vs. Engineering Teams

Eyal Estrin — Sun, 22 Feb 2026 16:08:11 +0000

In February 2026, Anthropic released a new capability for Claude Code called Claude Code Security - a new tool that thinks like a developer to find tricky logic errors in your code, ranking how risky they are and suggesting fixes you can review.

The news sent a shockwave through cybersecurity stocks, causing JFrog to crash by nearly 25% while others like CrowdStrike, Okta, and Cloudflare all saw their share prices tumble by around 8% or 9%.

The announcement raised a question: can AI tools replace the current SaaS or cybersecurity products, or can AI agents replace developers or engineering teams?

Anthropic’s Claude Code Security announcement highlights a move toward "agentic reasoning" - the ability for AI to understand complex data flows and logic flaws rather than just matching known patterns. While this is a significant leap for the "Defensive AI" movement, it does not signal the end of the human engineer or the mature SaaS platform.

In this blog post, I will share my point of view on the current advancement in AI technology.

The Modern SDLC and CI/CD Pipeline

The Software Development Life Cycle (SDLC) is a continuous loop. AI tools now act as "force multipliers" in these phases, but they lack the authority and context to own them.

Requirements and Planning

The Process: Translating vague business needs into technical specifications.
AI's Role: Summarizing stakeholder meetings and drafting initial user stories.
The Human Factor: AI cannot negotiate trade-offs. It doesn't understand that a "must-have" feature might be delayed because of a pending merger or a team's current burnout level.

Architecture and Design

The Process: Designing the blueprint for scalability and security across cloud providers like AWS, Azure, or GCP.
AI's Role: Suggesting common design patterns (e.g., Event-Driven vs. Microservices) and generating Infrastructure as Code (IaC).
The Human Factor: AI lacks "institutional memory." It doesn't know why a specific database was chosen three years ago to satisfy a unique compliance requirement that still exists.

Development and Implementation

The Process: Writing and committing the actual code.
AI's Role (Claude Code): This is where agentic tools live. They can read your files, run terminal commands, and fix bugs autonomously.
The Human Factor: Large codebases (50k+ lines) often exceed an AI's effective context window. As the context fills, the AI can introduce conflicting logic or "hallucinate" dependencies.

CI/CD: Testing and Security

The Process: Automating the path to production through integration and deployment pipelines.
AI's Role (Claude Code Security): It identifies high-severity vulnerabilities (e.g., broken access control) and suggests a verified patch.
The Human Factor: Anthropic emphasizes a "Human-in-the-Loop" model. AI cannot take the legal or professional blame for a botched security patch that causes a global outage.

Observability and Maintenance

The Process: Monitoring live systems and fixing production bugs at scale.
AI's Role: Analyzing logs to detect anomalies and suggesting fixes for "infrastructure drift."
The Human Factor: Being on-call at 3:00 AM requires high-stakes decision-making and cross-team coordination that AI agents cannot yet replicate.

Why GenAI Cannot Replace Experienced Engineers

Even with the reasoning capabilities shown in the 2026 Claude Code Security update, three "hard barriers" prevent AI from replacing the individual contributor:

The Responsibility Gap: Software isn't just code; it's a liability. No AI subscription comes with an insurance policy. Accountability is a human-only function. If a system fails, a human must explain why to a board or a regulator.
Reasoning vs. Intent: AI understands the structure of your code, but humans understand the intent. An AI might see a missing role-check as a bug, while a human knows it was bypassed for a specific, documented emergency migration path.
Technical Debt Acceleration: Recent 2026 studies show that when developers over-rely on AI, "code churn" (code that is rewritten or deleted within two weeks) doubles. AI writes code faster than it can be reviewed, potentially creating a "spaghetti" codebase if not guided by a senior architect.

Why AI Cannot Replace Mature SaaS Products

Many feared that AI's ability to "generate a clone" of an app would kill the SaaS industry. This hasn't happened for several concrete reasons:

SaaS is "Running," not "Building": Building a clone of Jira or Salesforce is the easy part. Operating it at 99.99\% availability, managing global data centers, and providing 24/7 support is what customers actually pay for.
Compliance and Trust: A mature SaaS product provides pre-built SOC2, GDPR, and HIPAA guardrails. An AI-generated app is a "black box" that hasn't been audited, making it a non-starter for enterprise or legal use.
The Integration Ecosystem: SaaS platforms thrive on their ecosystems (APIs, plugins, and third-party integrations). AI can write a script to connect two tools, but it cannot manage the long-term versioning and stability of a multi-vendor tech stack.

Summary

AI tools like Claude Code Security are the new "High-Level Languages" of 2026.

Just as C++ didn't kill programmers but made them more powerful, AI is shifting the engineer's role from "Coder" to "Orchestrator and Verifier."

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

About the Author

Inside the Amazon Nova Forge

Eyal Estrin — Mon, 09 Feb 2026 13:41:45 +0000

Amazon Nova Forge is a development environment within Amazon SageMaker AI dedicated to building "Novellas" - private, custom versions of Amazon’s Nova frontier models.

Unlike typical AI services that only allow you to use a model or fine-tune its final layer, Nova Forge introduces a concept called Open Training. This gives you access to the model at various "life stages" (checkpoints), allowing you to bake your company’s proprietary knowledge directly into the model’s core reasoning capabilities.

This blog post is an introduction to Amazon Nova Forge and what makes it unique in the training process.

What Makes it Different?

Prompt engineering and RAG provide external context but fail to change a model's core intelligence. Standard fine-tuning also falls short because it happens too late in the lifecycle, attempting to steer a "finished" model that is already set in its ways. Nova Forge solves this by moving customization earlier into the training process, embedding specialized knowledge where it actually sticks.

Nova Forge occupies a unique middle ground between Managed APIs (Bedrock) and building from scratch.

Amazon Bedrock: Bedrock is for consuming models. You can fine-tune them, but you are working on a "black box" model. Nova Forge is for building the model itself using deeper training techniques.
Azure AI / Google Vertex AI: While Azure and GCP offer fine-tuning, they generally don't provide access to intermediate training checkpoints of their frontier models. Nova Forge allows for Data Blending, where you mix your data with Amazon’s original training data to prevent the model from "forgetting" how to speak or reason.

Terminology

Novella: The resulting custom model you create. It’s a "private edition" of Nova.
Checkpoints: Saved "states" of the model during its initial training (pre-training, mid-training, post-training).
Data Blending: The process of mixing your proprietary data with Nova-curated datasets so the model stays smart while learning your specific business.
Reinforcement Fine-Tuning (RFT): Using "reward functions" (logic-based feedback) to teach the model how to perform complex, multi-step tasks correctly.
Catastrophic Forgetting: A common AI failure where a model learns new information but loses its original abilities. Nova Forge is designed specifically to prevent this.

The Workflow: From Training to Production

The process bridges the gap between the "lab" (SageMaker) and the "app" (Bedrock).

Selection: You choose a Nova base model and a specific checkpoint (e.g., a "Mid-training" checkpoint) in Amazon SageMaker Studio.
Training (SageMaker AI): You use SageMaker Recipes—pre-configured training scripts—to blend your data from S3 with Nova’s datasets. The heavy lifting (compute) happens on SageMaker's managed infrastructure.
Refinement: Optionally, you run RFT in SageMaker to align the model with specific business outcomes or safety guardrails.
Deployment (Bedrock): Once the "Novella" is ready, you import it into Amazon Bedrock as a private model.
Production: Your applications call the custom model via the standard Bedrock API, benefitting from Bedrock’s serverless scaling and security.

Below is a sample training workflow:

Data Privacy and Protection

The security model is the most critical part:

Sovereignty: Your data stays in your S3 buckets and within your VPC boundaries.
No Leakage: AWS explicitly states that customer data is not used to train the base Amazon Nova models. Your "Novella" is a private resource visible only to your AWS account.
Encryption: Data is encrypted at rest via KMS (AWS-managed or Customer-managed keys) and in transit via TLS 1.2+.
Governance: Access is controlled via standard IAM policies, and all training activity is logged in CloudTrail.

Pricing Model

Nova Forge carries a distinct cost structure that reflects its "frontier" status:

Subscription Fee: Access to the Forge environment starts at approximately --$100,000 per year.
Usage Costs: On top of the subscription, you pay for the SageMaker compute (GPUs) used during the training phase.
Comparison: Cheaper than Training from Scratch: Building a frontier model from zero costs millions in compute and months of R&D. Nova Forge provides the "shortcuts" to get the same result for a fraction of that.
- More Expensive than Basic Fine-Tuning: Standard fine-tuning on Bedrock is much cheaper (often just a few dollars per hour), but it cannot achieve the deep "domain-native" intelligence that Nova Forge provides.

Summary

Amazon Nova Forge marks a shift from generic AI to native intelligence, where models don't just reference your data—they are built from it. By using "Open Training," you can bake specialized knowledge into the model’s core at the pre-training or mid-training stages. This results in a private Novella that understands your specific industry as naturally as its base language.

Organizations managing high-value proprietary data should consider moving beyond treating that information as an external reference. If your workflows involve specialized terminology or regulated processes that standard LLMs struggle to master, shifting customization earlier in the training lifecycle is often more effective than basic fine-tuning.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

Additional references

About the Author

ClawdBot Security Guide

Eyal Estrin — Mon, 02 Feb 2026 14:02:19 +0000

Clawdbot (now renamed Moltbot) is an open-source, self-hosted AI assistant that runs on your own hardware or server and can-do things, not just chat.

It was created by developer Peter Steinberger in late 2025.

It connects your AI model (OpenAI, Claude, local models via Ollama) to real capabilities: automate workflows, read/write files, execute tools and scripts, manage emails/calendars, and respond through messaging apps like WhatsApp, Telegram, Discord and Slack.

You interact with it like a smart assistant that actually takes action based on your input.

What is it used for?

Clawdbot functions as a "digital employee" or a "Jarvis-like" assistant that operates 24/7. Because it has direct access to your local filesystem and system tools, it can perform proactive tasks that standard AI cannot:

Communication Hub: It lives inside messaging apps like Telegram, WhatsApp, or Slack. You text it commands, and it can reply, summarize threads, or manage your inbox.
Proactive Automation: It can monitor your email, calendar, and GitHub repositories to fix bugs while you sleep, draft replies, or alert you to flight check-ins.
System Execution: It can run shell commands, execute scripts, manage files, and even control web browsers to perform actions like making purchases or reservations.
Persistent Memory: It maintains long-term context across conversations, remembering your preferences and past tasks for weeks or months.

Below is a sample deployment architecture of Clawdbot:

Security risks associated with Clawdbot

Clawdbot is a high-privilege automation control plane. Since it manages agents, tools, and multiple communication channels, it presents serious security risks.

Control plane exposure & misconfiguration

Exposure: Misconfigured dashboards and reverse proxies have left hundreds of control interfaces open to the internet.
Authentication Failures: Some setups treat remote connections as local, letting attackers bypass authentication.
Data Theft: Unsecured instances can expose API keys, conversation logs, and configuration data.
System Takeover: In certain cases, attackers can run commands on the host with elevated privileges.

Prompt injection & tool blast radius

Manipulation: Malicious or untrusted content can trick the AI into using tools in unintended ways.
Blast Radius: Access to high-privilege tools like shell commands or admin APIs means a prompt injection could lead to data theft or lateral movement across the network.
Model Weakness: Older or poorly aligned AI models are more likely to ignore safety instructions, increasing risk.

Social engineering and user level abuse

Deception: Attackers can manipulate the bot to extract personal or environment-specific information.
Account Misuse: Connected commerce tools could be used for unauthorized purchases.
Phishing: A compromised bot can send malicious links or scripts to contacts.
Upstream Data Exposure: Prompts and tool outputs sent to AI providers can create privacy or compliance issues if not carefully managed.

Data privacy, logs, and long term memory

Sensitive Data Exposure: The gateway stores conversation histories and memory, which may include personal or business information depending on usage.
Dashboard and Host Vulnerabilities: Exposed dashboards or weak host protections can allow attackers to access past chats, file transfers, and stored credentials (API keys, tokens, OAuth secrets), turning the instance into a data exfiltration point.
Upstream Data Risk: Prompts and tool outputs are sent to AI providers. Without proper scoping and data classification, this can create privacy and compliance issues.

Ecosystem risks: hijacked branding, fake installers, and scams

Hijacked Accounts: After a rebrand, original social media and GitHub handles were exploited by scammers promoting fake crypto tokens.
Malware Risk: Users searching for the tool may encounter backdoored versions or fake installers designed to compromise their systems.

Network and Remote Access Risks

Browser Control: Tools that let the bot control a browser can expose local or internal network resources if not secured.
Tunneling Errors: Misconfigured reverse proxies or tools like Tailscale may grant attackers unintended access to private networks.

Recommendations for securing Clawdbot

Based on the official GitHub repository, documentation, and expert audits from January 2026, here are the recommendations for securing your instance.

Lock Down the Gateway

Bind the Clawdbot gateway to loopback (127.0.0.1) and never expose it directly to the internet. If remote access is required, use private mesh solutions such as Tailscale or Cloudflare Tunnel. Always enable gateway authentication using tokens or passwords.

References:

Enforce Strict Access Controls

Restrict who can interact with Clawdbot by enforcing DM pairing or allowlists. Avoid wildcard policies in production. In group chats, require explicit mentions before the bot processes messages.

Reference:

Official GitHub SECURITY.md

Isolate the Runtime Environment

Run Clawdbot on dedicated hardware or a dedicated VM/container. Avoid running it on your primary workstation. Use Docker sandboxing with minimal mounts and dropped capabilities.

References:

Sandbox and Restrict Tools

Enable sandboxing for all high-risk tools such as exec, write, browser automation, and web access. Use tool allow/deny lists and restrict elevated tools to trusted users only.

Reference:

Official GitHub Security Overview

Apply Least Privilege to Agent Capabilities

Disable interactive shells unless strictly necessary. Limit filesystem visibility to read-only mounts where possible. Avoid granting elevated privileges to agents handling untrusted input.

Reference:

Official Clawdbot Documentation

Secure Credentials and Secrets

Store secrets in environment variables, not configuration files or source control. Apply strict filesystem permissions to Clawdbot directories and rotate credentials after any suspected incident.

Reference:

Official Clawdbot Security Documentation

Continuous Auditing and Monitoring

Regularly run built-in security audit and doctor commands to detect unsafe configurations. Monitor logs and session transcripts for anomalous behavior or unexpected access.

Reference:

Official GitHub Security CLI Documentation

Harden Browser Automation

Treat browser automation as operator-level access. Use dedicated browser profiles without password managers or sync enabled. Never expose browser control ports publicly.

Prompt-Level Safety Rules

Define explicit system rules that prevent disclosure of credentials, filesystem structure, or infrastructure details. Require confirmation for destructive actions.

Reference:

Official Clawdbot Security Documentation

Incident Response Preparedness

Maintain a documented response plan. If compromise is suspected: stop the gateway, revoke access, rotate all secrets, review logs, and re-run security audits.

Reference:

Official Clawdbot Security Documentation

Summary

ClawdBot is a high-privilege AI agent that can act on your system, not just chat. Its main risks come from exposed gateways, weak access controls, and powerful tools combined with prompt injection or social engineering, which can lead to system compromise and data loss. To use it safely, lock the gateway to localhost with authentication, restrict who can interact with it, isolate its runtime, minimize tool permissions, and monitor it continuously.

Disclaimer: AI tools were used to research and edit this article. Graphics are created using AI.

References:

About the Author

Food Flavors 2026: Viral Recipes & Zero-Waste Cooking

sattakingai01 — Fri, 30 Jan 2026 10:17:07 +0000

Food Trends and Flavors to Explore: Insights from Usatrendingtodays

Food is much more than a daily necessity—it is culture, comfort, creativity, and connection. Across the world, food brings people together, tells stories, and reflects traditions passed down through generations. In recent years, the way people think about food has changed significantly. From healthy eating to global flavors and sustainable choices, food trends continue to evolve. Platforms like usatrendingtodays help readers stay informed about what’s popular, nutritious, and exciting in the world of food.

The Importance of Food in Everyday Life

Food plays a central role in our daily routines. It fuels our bodies, supports our health, and influences our mood and energy levels. Beyond nutrition, food also creates emotional connections—family meals, celebrations, and shared experiences often revolve around food.

According to insights shared on usatrendingtodays, people today are more conscious about what they eat. They want food that is not only tasty but also healthy, ethically sourced, and prepared with care. This growing awareness has reshaped food choices around the world.

Global Food Culture and Diversity

One of the most beautiful aspects of food is its diversity. Every culture has its own unique flavors, ingredients, and cooking methods. Italian pasta, Indian curries, Japanese sushi, Mexican tacos, and Middle Eastern kebabs all represent the traditions and lifestyles of their regions.

Usatrendingtodays often highlights global food trends, encouraging people to explore international cuisines. Thanks to globalization and social media, trying foods from different cultures has become easier than ever, even without traveling far from home.

Healthy Eating and Nutrition Trends

Healthy eating has become a major focus in modern lifestyles. People are paying closer attention to ingredients, portion sizes, and nutritional value. Diets rich in fruits, vegetables, whole grains, and lean proteins are widely encouraged.

Platforms like usatrendingtodays discuss popular nutrition trends such as plant-based diets, gluten-free options, low-sugar meals, and balanced eating habits. These trends help people make informed decisions that support long-term health without sacrificing flavor.

The Rise of Home Cooking

Home cooking has seen a strong comeback in recent years. Many people now prefer preparing meals at home to control ingredients, save money, and enjoy fresh food. Cooking at home also allows creativity and experimentation with recipes.

Usatrendingtodays shares easy and practical cooking ideas that suit busy lifestyles. From quick weekday meals to special weekend recipes, home cooking encourages healthier habits and strengthens family bonds through shared meals.

Street Food and Casual Dining

Street food is loved worldwide for its bold flavors, affordability, and cultural authenticity. From food trucks to local markets, street food offers a taste of tradition in every bite. It reflects local ingredients and cooking styles while being accessible to everyone.

According to usatrendingtodays, street food trends are gaining global popularity. Many street food dishes have inspired restaurant menus, blending casual dining with gourmet creativity.

Sustainable and Ethical Food Choices

Sustainability is becoming an important part of food culture. People are more aware of how food production affects the environment. Reducing food waste, choosing locally sourced ingredients, and supporting ethical farming practices are now key concerns.

Usatrendingtodays highlights sustainable food movements that promote eco-friendly choices. Conscious eating not only benefits personal health but also contributes to a healthier planet.

Food and Technology

Technology has transformed the food industry in many ways. Online food delivery apps, digital recipes, smart kitchen appliances, and virtual cooking classes have made food more accessible and convenient.

Platforms like usatrendingtodays explore how technology is shaping food habits. From discovering new restaurants to learning cooking techniques online, technology continues to enhance the way people experience food.

Comfort Food and Emotional Connection

Comfort food holds a special place in people’s hearts. These are the meals that bring warmth, nostalgia, and a sense of happiness. Comfort food varies across cultures but often includes simple, familiar dishes that remind people of home.

Usatrendingtodays notes that comfort food remains popular, especially during stressful times. While modern trends come and go, classic comfort meals continue to provide emotional satisfaction and balance.

Food for Special Occasions

Food is an essential part of celebrations and traditions. Festivals, weddings, holidays, and family gatherings are often centered around special dishes. These meals carry cultural meaning and create lasting memories.

Through usatrendingtodays, readers can explore how different cultures celebrate with food. Understanding these traditions helps people appreciate the deeper significance behind recipes and culinary customs.

Food Blogging and Social Media Influence

Social media has changed the way people discover and share food. Food blogs, recipe videos, and restaurant reviews influence what people eat and where they dine. Visual platforms have made food presentation just as important as taste.

Usatrendingtodays keeps track of trending food content and viral recipes, helping readers stay updated with what’s popular online. Social media continues to shape food culture in creative and exciting ways.

The Future of Food

The future of food is focused on innovation, health, and sustainability. Plant-based alternatives, lab-grown meat, organic farming, and personalized nutrition are gaining attention. People are looking for food that aligns with both their health goals and ethical values.

As highlighted on usatrendingtodays, the food industry will continue to evolve to meet changing consumer demands. Technology, awareness, and creativity will play a major role in shaping what we eat in the years to come.

Conclusion

Food is a powerful part of human life, connecting people across cultures, generations, and experiences. From everyday meals to global cuisines and emerging trends, food continues to evolve with society. Staying informed helps people make better choices and enjoy food more mindfully.

Platforms like usatrendingtodays offer valuable insights into food trends, healthy habits, and cultural influences. By exploring new flavors, supporting sustainable practices, and appreciating the role of food in daily life, individuals can turn every meal into a meaningful experience. Food is not just about eating—it’s about enjoyment, connection, and celebrating life itself.

Securing AI Skills

Eyal Estrin — Mon, 26 Jan 2026 15:11:51 +0000

If you give an AI system the ability to act, you give it risk.

In earlier posts, I covered how to secure MCP servers and agentic AI systems. This post focuses on a narrower but more dangerous layer: AI skills. These are the tools that let models touch the real world.

Once a model can call an API, run code, or move data, it stops being just a reasoning engine. It becomes an operator.

That is where most security failures happen.

Terminology

In generative AI, "skills" describe the interfaces that allow a model to perform actions outside its own context.

Different vendors use different names:

Tools: Function calling and MCP-based interactions
Plugins: Web-based extensions used by chatbots
Actions: OpenAI GPT Actions and AWS Bedrock Action Groups
Agents: Systems that reason and execute across multiple steps

A base LLM predicts text; A skill gives it hands.

Skills are pre-defined interfaces that expose code, APIs, or workflows. When a model decides that text alone is not enough, it triggers a skill.

Anthropic treats skills as instruction-and-script bundles loaded at runtime.

OpenAI uses modular functions inside Custom GPTs and agents.

AWS implements the same idea through Action Groups.

Microsoft applies the term across Copilot and Semantic Kernel.

NVIDIA uses skills in its digital human platforms.

In the reference high-level architecture below, we can see the relations between the components:

Why Skills Are Dangerous

Every skill expands the attack surface. The model sits in the middle, deciding what to call and when. If it is tricked, the skill executes anyway.

The most common failure modes:

Excessive agency: Skills often have broader permissions than they need. A file-management skill with system-level access is a breach waiting to happen.
The consent gap: Users approve skills as a bundle. They rarely inspect the exact permissions. Attackers hide destructive capability inside tools that appear harmless.
Procedural and memory poisoning: Skills that retain instructions or memory can be slowly corrupted. This does not cause an immediate failure. It changes behavior over time.
Privilege escalation through tool chaining: Multiple tools can be combined to bypass intended boundaries. A harmless read operation becomes a write. A write becomes execution.
Indirect prompt injection: Malicious instructions are placed in content that the model reads: emails, web pages, documents. The model follows them using its own skills.
Data exfiltration: Skills often require access to sensitive systems. Once compromised, they can leak source code, credentials, or internal records.
Supply chain risk: Skills rely on third-party APIs and libraries. A poisoned update propagates instantly.
Agent-to-agent spread: In multi-agent systems, one compromised skill can affect others. Failures cascade.
Unsafe execution and RCE: Any skill that runs code without isolation is exposed to remote code execution.
Insecure output handling: Raw outputs passed directly to users can cause data leaks or client-side exploits.
SSRF: Fetch-style skills can be abused to probe internal networks.

How to Secure Skills (What Actually Works)

Treat skills like production services. Because they are.

Identity and Access Management

Each skill must have its own identity. No shared credentials. No broad roles.

Permissions should be minimal and continuously evaluated. This directly addresses OWASP LLM06: Excessive Agency.

Reference: OWASP LLM06:2025 Excessive Agency

AWS Bedrock

Assign granular IAM roles per agent. Restrict regions and models with SCPs. Limit Action Groups to specific Lambda functions.

References:

Microsoft Foundry

Disable key-based auth. Use Entra ID and Managed Identities. Restrict connectors at the agent level.

References:

Google Vertex AI

Use Workload Identity Federation. Scope permissions explicitly in agent configs.

Reference: Secure your Agentic and Generative AI with Google Cloud

OpenAI

Never expose API keys client-side. Use project-scoped keys and backend proxies.

Reference: Best Practices for API Key Safety

Input and Output Guardrails

Prompt injection is not theoretical. It is the default attack.

Map OWASP LLM risks directly to controls.

Reference: OWASP Top 10 for Large Language Model Applications

AWS Bedrock

Use Guardrails with prompt-attack detection and PII redaction.

Reference: Amazon Bedrock Guardrails

Microsoft Foundry

Enable Prompt Shields and groundedness detection.

Reference: Azure AI Content Safety

Google Vertex AI

Use Model Armor and safety filters at the API layer.

Reference: Model Armor overview

OpenAI

Use zero-retention mode for sensitive workflows.

Reference: Data controls in the OpenAI platform

Anthropic

Use constitutional prompts, but still enforce external moderation.

Reference: Building safeguards for Claude

Adversarial Testing

Red-team your agents.

Test prompt injection, RAG abuse, tool chaining, and data poisoning during development. Not after launch.

Threat modeling frameworks from OWASP, NIST, and Google apply here with minimal adaptation.

References:

DevSecOps Integration

Every endpoint a skill calls is part of your attack surface.

Run SAST and DAST on the skill code. Scan dependencies. Fail builds when violations appear.

References:

Isolation and Network Controls

Code-executing skills must run in ephemeral, sandboxed environments.

No host access. No unrestricted outbound traffic.

Use private networking wherever possible:

Logging, Monitoring, and Privacy

If you cannot audit skill usage, you cannot secure it.

Enable full invocation logging and integrate with existing SIEM tools.

Ensure provider data-handling terms match your risk profile. Not all plans are equal.

References:

Incident Response and Human Oversight

Update incident response plans to include AI-specific failures.

For high-risk actions, require human approval. This is the simplest and most reliable control against runaway agents.

References:

Summary

AI skills are the execution layer of generative systems. They turn models from advisors into actors.

That shift introduces real security risk: excessive permissions, prompt injection, data leakage, and cascading agent failures.

Secure skills the same way you secure production services. Strong identity. Least privilege. Isolation. Guardrails. Monitoring. Human oversight.

There is no final state. Platforms change. Attacks evolve. Continuous testing is the job.