The Ops Community ⚙️

Cover image for The big 3 observability tools: Datadog vs New Relic vs Splunk
Argonaut
Argonaut

Posted on • Originally published at argonaut.dev

The big 3 observability tools: Datadog vs New Relic vs Splunk

When we consider adding observability to our stack, we may default to popular options like Datadog. While they have a ton of features covering most use cases, they also come with premium pricing attached to it. This blog covers a tool-wise comparison of these platforms, comments from existing users, and our recommendations for your observability stack.

Not only are observability tools costly, costing about 30% of a company’s outside vendor spending, but they also lack visibility and transparency in their pricing, making it nearly impossible to calculate how their pricing has been calculated. The various services have different pricing models, each with limits (number of hosts/devices/requests/ingest) and multiple tiers with monthly and annual billing options.

Datadog reviews

Data from Statista shows that the Observability tools market, worth $12.9 Billion in 2020, is set to reach $19.3 Billion by 2024. These tools have applications in various industries, from media, communications, financial services, technology, health, the public sector, manufacturing, etc., in all regions of the world. Organizations are also increasing the number of observability tools used while consolidating the number of vendors. (Splunk, State of Observability 2022)

o11y tools used

Here is what users have to say

Before we dive into feature-specific comparison, let’s look at what the users of these tools have to say. These are some common likes, dislikes, and good to know we found from over half a dozen review sites, including G2, Gartner, and software advice.

New Relic

Liked:

  • Straightforward setup
  • Share dashboards externally
  • Transparent pricing with great ROI
  • AI-based anomaly detection and alerting
  • Built on OpenTelemetry standards
  • Go backward in time and analyze historical bottlenecks and consumption trends

Disliked:

  • Complex query builder, limited regex, and other limitations of NRQL
  • The learning curve to make use of all customization options can be overwhelming
  • Incorrect query results, bugs, sluggish experience

Keep in mind:

  • It comes with a generous free tier; use it yourself to see if it works
  • Their network monitoring doesn’t yet support all infra
  • Use their data ingest cost estimator

Splunk

Liked:

  • Works well for high-volume data ingestion
  • Indexing and ML applied to the data make it valuable
  • Built-in reports and dashboards that can be customized
  • Built on OpenTelemetry standards
  • They offer workload, ingest, and entity pricing options

Disliked:

  • Monthly pricing is expensive and opaque
  • Lack of visualization options, room for improvement with their interface

Keep in mind:

  • There is a 10k session/month per host limit for RUM
  • If you don’t want the bundle offerings in Standard and Pro plans, you can also choose individual offerings as per your requirements
  • Consider Splunk’s flexible pricing options to find the one that best suits you
  • If you’re also considering using Splunk for IT Ops and Security, check their cloud platform

Datadog

Liked:

  • Stability and the constant addition of new features
  • Out-of-the-box integrations
  • AI and ML Capabilities
  • All tools in one place, with a ton of customization options

Disliked:

  • Pricing is opaque, inflexible billing, and unexpected charges
  • Hard to find the right metric and look at historical data
  • Deploying the agents is still a very manual task

Keep in mind:

  • Leverage estimated usage and set alerts to avoid overspending
  • Monthly billing is 20-50% higher than annual billing
  • Their security offerings are relatively new and not up to speed with the rest of their monitoring
  • Their free trial is limited compared to their competitors

Moving into the comparison, we compare each tool by its features. Starting with APM & log management and ending with alerting & network monitoring.

Note: Here we only compare non-enterprise tiers. Free tiers (where available), Splunk’s Standard tier, Datadog’s Pro tier, and New Relic’s Pro tier. If you’re interested in enterprise pricing, check out their pricing pages - Datadog, Splunk, New Relic.

APM - Application Performance Monitoring

Features Datadog Splunk New Relic
Tracing Automatic trace_Id injection to logs, connect traces to infra metrics, network calls, and live processes Collect all trace data, AI-powered methods to sift through trace data Observes 100% of traces and provides actionable insights
Live visibility All ingested traces and service dependencies over the last 15 minutes NoSample™ full-fidelity tracing, collecting 100% of traces combined with AI-driven directed troubleshooting makes detection time fast Real-time streaming sends data every 5 seconds, can view, visualize and query that data
Control Set SLOs, track trends, and monitor KPIs by generating span-based metrics using any set of tags Turns every span and trace into metrics, to create pre-built service monitoring dashboards Manage SLOs with automated service level management
Deploy Monitor and compare impacts of canary, blue-green, and shadow deploy Can have multiple, distinct application environments that don’t interact directly with each other but that are all being monitored by Splunk APM Tracking deployments create deployment markers that appear in APM charts.
Supported languages Java, .NET, PHP, Node.js, Ruby, Python, Go, or C++ applications Java, Python, .Net (Core and Framework), Node.js, GoLang, Ruby, and PHP Java, .NET, PHP, Node.js, Ruby, Python, Go, or C applications
Frameworks hundreds of frameworks hundreds of frameworks hundreds of frameworks
Performance monitors Applications, hosts, containers, serverless functions, and PaaS Applications, containers, serverless functions, microservices Applications, hosts, containers, database services, or grouping of these
Related products Continuous Profiler AlwaysOn continuous code profiling New Relic Edge with Infinite Traces
Support for OpenTelemetry Yes Yes Yes
Starting price $ 31/month/host (Billed annually) $55/month/host (Billed annually) Also offer usage-based pricing Free tier, pay as you go

Log management

Features Datadog Splunk New Relic
Ingest data from Any source, at any scale Ingest from any source at any scale. Separate ingest and indexed logs to reduce costs. Complement existing agents with OpenTelemetry Any text-based data using the forwarder that works best in your environment
Tracking Track trends, metrics, and KPIs from all logs Blend logs with real-time metrics, in context troubleshooting
Dashboard Visualize summarized logs data on dashboards. create and save granular views. Log metrics into real-time dashboards Automatically clusters logs into patterns and detects outliers
Search and query Search, filter, and analyze logs on the fly—no complex query language required No-code search. Watch critical logs with unified filters and time controls. Easily transition between logs, traces, and metrics.
Related Features Live Tail & Logging Without Limits Infinite Logging using S3 buckets
Decouples log ingestion and indexing
Pricing Free tier, pay as you go
Ingest Starts at $ 0.10 per GB ingested or scanned GB/mo Starts at $ 0.10 / host / GB ingested / mo
Retention 3-day retention starts $1.06 / million log events/mo Standard retention is 30 days
Index Starts at $5 / indexed GB/ host /mo

Infra monitoring

Features Datadog Splunk New Relic
Environments supported On-premise, hybrid, IoT, and multi-cloud environments On-prem, hybrid or multi-cloud Cloud and on-prem infrastructure
Visibility Tens of thousands of metrics, out of the box. one-click corelation of related metrics Correlation between your hybrid infrastructure and microservices, insights for faster troubleshooting Spot all issues and monitor performance in one place
Integrations Vendor-backed integrations for k8s, serverless and 500+ popular technologies 250+ cloud service integrations and pre-built dashboards out of the box Close to 500 integrations are available
Historical records Yes, even on resources that don't exist anymore - Time travel back to incident's origins and replay the historical state
Starting price Limited free plan, Pro plan starts at $15 /month/host $15/month/host Free tier, pay as you go

Serverless monitoring

Features Datadog Splunk New Relic
Visibility Get all your functions in one place. Pre-built visualizations Visualize, trace, alert
Real-Time Ingest, search, and analyze 100% of traces live over the last 15 minutes. Real-time alerts on memory, timeout, and concurrency metrics Monitoring and alerting on every function -
Metrics monitored 1. Cold starts 2. Errors 3. Memory 4. Timeout and latency 5. Concurrency and custom metrics for CX 1. Cold starts 2. Errors 3. Invocations 4. Compute duration 5. Custom business & CX metrics 1. Invocations 2. Errors 3. Spans 4. Custom metrics
Supports Lambda, Google Cloud Functions, Azure Functions, AWS SAM, Serverless Framework, and AWS CDK integrations Lambda, Google Cloud Functions, Azure Functions Lambda, Google Cloud Functions, Azure Functions
Test in CI/CD pipeline Yes, Integrate to CI/CD pipelines Yes, automatically pass/fail builds based on the performance budget in your CI/CD pipelines Yes, Integrate to CI/CD and build pipelines
Pricing Workload monitoring starts at $5 / active function/mo Usage-based pricing Free tier, pay as you go

Real user management

Features Datadog Splunk New Relic
Visibility Web apps, Native mobile apps, app backends Web apps, Native mobile apps, app backends Browser monitoring and mobile monitoring
Core web vitals Yes, page load, interactivity and visual stability. Filter by location, device, etc. Yes, page load, interactivity and visual stability. Filter by location, device, etc. Yes, page load, interactivity and visual stability. Filter by location, device, etc. Set alerts when vitals drops.
Full session analysis Contextualize user sessions attributes like user ID, email, and name. Ingest custom metrics and track business-critical user actions Including route change, API calls, impact of images and resources on user See trends with sessions, filter by app and device versions
Native mobile apps Troubleshoot app crashes, set up alerts, connect server-side and client-side metrics Auto capture common client attributes - app crash report, full app lifecycle visibility, network requests and errors Insights into crashes, handled exceptions, and network failures
Session replay Yes, 30-day retention policy - Reproduce incidents using event trails and mobile breadcrumbs
Integrates with logs, APM, profiler Splunk APM for Backend visibility
Pricing Starting at $ 0.45 / 1,000 sessions / month* Starting at $ 14 / 10,000 sessions / mo Free tier, pay as you go

Synthetic monitoring

Features Datadog Splunk New Relic
Locations Simulated requests and actions from around the globe, and synthetic private locations Run simulated tests from nearly 50 global locations Simulate traffic across thousands of public and private locations
Monitor types Allows single and chained requests at these levels: HTTP, SSL, DNS, WebSocket, TCP, UDP, ICMP, and gRPC health check. Request level, run level metrics, test-level, page-level, transaction-level metrics Broken links, certificate check, ping, step, simple and scripted browser monitors, and API tests.
Troubleshooting Get full context for troubleshooting failed test runs with correlated metrics, traces, and logs Track and report SLOs and SLAs for uptime and performance. 300+ optimization recommendations to fix defects and improve UX Identify issues from a third party, backend service, and infrastructure. Improve end-user experience with user-centric metrics
Record Record browser tests and monitors customer experiences with end-to-end tests Capture screenshots and simulated sessions. Configure test schedules and set up alerts. Scripted browsers tests are driven by Selenium WebJS to emulate customer navigation, action, and more
Test in CI/CD pipeline Yes, Integrate to CI/CD pipelines Yes, automatically pass/fail builds based on the performance budget in your CI/CD pipelines Yes, Integrate to CI/CD and build pipelines
Starting price Uptime Tests $1/mo/ 10,000 requests (billed annually) Free tier, pay as you go
API Tests $ 5 /mo /10,000 test runs (billed annually) API Tests $ 4 /mo/10,000 test runs (billed annually)
Browser Tests $ 12 /mo /1,000 test runs (billed annually) Browser Tests $12/mo/1,000 test runs (billed annually)

Alerts and incident management

Features Datadog Splunk New Relic
Notifications Datadog's web & mobile app, Slack app, Hangouts Chat, and Microsoft Teams, and more Meta-data-rich alerts on any device, incl. iOS, and Android apps Smart detection distinguishes between critical and minor concerns. Scheduling and muting capabilities are also available
Automation Automatically apply alerts to new hosts, and detect anomalies in apps, infra, and services. Automated incident management workflows Automate scheduling, time-sensitive actions incl. escalations, war room, and post-incident resolution Baseline conditions automatically adjust based on the system's behavior. Anomaly detection. Automatically sets permissions, no personal data collected.
Context Describe the incident and pass on assessment fields such as root cause, detection method, services, etc. Identify similar incidents using historical insights and audit trails. Use resources like run books, articles, and dashboards to help responders triage and resolve incidents faster You can include charts about the incident to provide context
Custom triggers With an Anomaly monitor, set anomaly detection, trigger window, and recovery window. Advanced options with seasonality, algorithms available Rules Engine is a full-stack service level feature that allows you to set certain conditions, and trigger custom action Set alert conditions specific to data sources or data behavior thresholds
Integrations 100+ integrations works with your existing workflow 100+ integrations out-of-the-box Yes, all major integrations
Reports Create, track, and report on critical SLOs and visualize them on dashboards Post-Incident review, MTTA/MTTR performance report, On-call report, Incident frequency report
Starting Price $20/user/mo (billed annually) Up to 10 users - $ 5/user/mo (Billed annually) Free tier, pay as you go
10+ users $23/user/mo (billed annually)

Splunk On-Call was previously VictorOps, which was acquired by Splunk in 2018.

Network monitoring

Features Datadog Splunk New Relic
Visibility Monitor the performance of connections among your hosts, services, virtual private clouds (VPCs), and other elements of your on-prem, public, or private cloud. - Analyze all of your network, app, infrastructure, and digital experiences on a single platform
Network metrics Traffic between any two endpoints, TCP retransmits, latency, connection churn, - Network syslogs, Network flow logs, cloud flow logs
Device Metrics Automatically discover and collect metrics on your network from any device, drill down and create custom views to evaluate device performance - Device performance via SNMP
Forecasting Use forecasting to determine when interfaces will exceed their available bandwidth -
DNS Analyze system-wide DNS performance, Assess DNS server health with request-volume, response-time, and error-code metrics, - -
Starting price Network Performance Monitoring $5 /mo/host (billed annually) Free tier, pay as you go
Network device monitoring $7/mo/device (billed annually)
  • New Relic’s solution is based on the ktranslate docker container. This single container image is hosted in your environment to collect and process your data to be exported to the Event, Metric, and Log APIs and displayed in New Relic.

As we can see from all of the above charts, Splunk and New Relic have nearly caught up to the offerings of Datadog on most fronts. Through their recent push towards observability, recent acquisitions, contributions to open-source projects, and partnerships. They also offer aggressive pricing, which undercuts Datadog’s and other expensive competition on most fronts.

Choosing your observability stack

As the business impact of outages rises day by day, more and more businesses are likely to spend on Observability tools. While the above comparisons give you a basic overview of which tools to use and how much each might cost, here are a few considerations to keep in mind while choosing the observability tool.

  1. Analyze which parts of your stack needs monitoring the most, then analyze and try out tools specific to that need. There are open-source and free tiers tools available for most of these features.
    1. Infrastructure
    2. Networks
    3. Application performance
    4. End-user experience
    5. Alerting
  2. Understand which pricing model works best for your current infra setup. Each vendor has different pricing models, so try out their free trials and see if their ease of use and transparency in pricing is comfortable for you.
    1. Workload-based pricing
    2. The volume of telemetry ingested
    3. Number of users
    4. Event-based pricing
  3. If you’re not quite ready for enterprise volume discounts, look to use multiple tools from different vendors. More than half of the customers surveyed use ten or more tools.
  4. Work on your existing strengths. Enterprise customers using these tools have dedicated IT teams set up for monitoring. As a smaller company looking to build on the cloud, leverage automation where possible so you can spend more time on your code and less time watching your systems.
  5. Try installing tools like Prometheus and Grafana to understand how much work it is before testing a tool like Datadog. This will help you understand if the time saved using these tools is worth the money you’ll spend on them.
  6. Have a way to train your team, and make it easy to onboard new employees into your stack. Once your observability stack is up and running, it is important to have good getting started guides (internal or external), which can help your team learn and work with your observability tools easily.

Conclusion

Choosing between Datadog, Splunk, and New Relic can be difficult as they have similar offerings at first glance. But, there are a few things each platform does better than the others. Using the feature-level comparisons and considerations mentioned above, you can narrow down and make the best decision that saves cost and works well with your stack.

While there is no shortage of observability tools in the market today, choosing from prominent vendors has its advantages in terms of compatibility, cost, and easier monitoring. And that's why companies today are choosing to consolidate their vendors.

If you are interested in seeing the progress the industry has made in the past 12 months and where it is headed, check out this insightful blog by Hayden James.

Argonaut removes the complexity out of your app and infra deployments. You can also integrate third party apps like your favorite observability tool. Get started with Argonaut today.

Top comments (0)