Synthetic Monitoring

Simulate visitor interaction with your site to monitor the end user experience.

View Product Info

FEATURES

Simulate visitor interaction

Identify bottlenecks and speed up your website.

Learn More

Real User Monitoring

Enhance your site performance with data from actual site visitors

View Product Info

FEATURES

Real user insights in real time

Know how your site or web app is performing with real user insights

Learn More

Infrastructure Monitoring Powered by SolarWinds AppOptics

Instant visibility into servers, virtual hosts, and containerized environments

View Infrastructure Monitoring Info
Comprehensive set of turnkey infrastructure integrations

Including dozens of AWS and Azure services, container orchestrations like Docker and Kubernetes, and more 

Learn More

Application Performance Monitoring Powered by SolarWinds AppOptics

Comprehensive, full-stack visibility, and troubleshooting

View Application Performance Monitoring Info
Complete visibility into application issues

Pinpoint the root cause down to a poor-performing line of code

Learn More

Log Management and Analytics Powered by SolarWinds Loggly

Integrated, cost-effective, hosted, and scalable full-stack, multi-source log management

 View Log Management and Analytics Info
Collect, search, and analyze log data

Quickly jump into the relevant logs to accelerate troubleshooting

Learn More

7 Things to Consider in the Wake of Last Night’s Outage

Today many of our friends are probably getting an extra shot of coffee after last night’s outage. Our live map lit up with over 100,000 outages around the world.

ec2

AWS had a routing problem that caused many sites to go down. The incident lasted around 40 minutes and affected Slack, Netflix, Pinterest and many others. The root cause could have something to do with a route leak, the leap second, or something else. It will probably take some time to find out for sure.

Many of our customers got alerts, many in the middle of the night, and were suspicious that they were false alarms or a problem in our systems. That is why we have a second opinion process to confirm when a site is down.

Sometimes a site appears to be up for some people but it is having problems for others. Intermittent problems are hard to spot and root cause. This is especially true when the problem is in IP routing, as it appears to be the case with this outage.

Here are the 7 things to consider in the wake of this outage:

  1. Evaluate the reliability of your cloud provider by the quality and detail of their communications during an outage
  2. Understand all the point of failure for your system: from DNS and network down to the databases powering your site
  3. It’s a good time to review what your monitoring strategy including not only what needs to be monitored and from which locations but also your alerts, escalations and response procedures.
  4. Review your user notification plans and consider setting up a public status page
  5. Good time to brush up on root cause identification on Pingdom and your internal systems.
  6. All systems fail sometimes. You need to design for failure. Don’t blame AWS.When they have an outage it is a very public event, but in reality most cloud providers have better uptime records than the majority of on-premise datacenters and they are getting better and better.
  7. For business-critical sites on the cloud follow multi-region or multi-cloud redundancy best practices. If you are the business owner, talk to your It team to understand their high availability and redundancy strategy.

 

If you don’t have testing and alerting in place yet, consider using Pingdom. Over 700K users trust Pingdom to let them know if their sites are up/down, monitor transactions, user experience, performance and incident management.

What else should web professionals think about in the aftermath of an incident like last night’s? What are your tips or best practices? Please share your opinion in the comments.

Introduction to Observability

These days, systems and applications evolve at a rapid pace. This makes analyzi [...]

Webpages Are Getting Larger Every Year, and Here’s Why it Matters

Last updated: February 29, 2024 Average size of a webpage matters because it [...]

A Beginner’s Guide to Using CDNs

Last updated: February 28, 2024 Websites have become larger and more complex [...]

The Five Most Common HTTP Errors According to Google

Last updated: February 28, 2024 Sometimes when you try to visit a web page, [...]

Page Load Time vs. Response Time – What Is the Difference?

Last updated: February 28, 2024 Page load time and response time are key met [...]

Monitor your website’s uptime and performance

With Pingdom's website monitoring you are always the first to know when your site is in trouble, and as a result you are making the Internet faster and more reliable. Nice, huh?

START YOUR FREE 30-DAY TRIAL

MONITOR YOUR WEB APPLICATION PERFORMANCE

Gain availability and performance insights with Pingdom – a comprehensive web application performance and digital experience monitoring tool.

START YOUR FREE 30-DAY TRIAL
Start monitoring for free