- This status page actually identified the outage: https://hackernews.onlineornot.com/ - Pages by Hund and Statuspal did not show the outage.
- The last post before the outage was https://news.ycombinator.com/item?id=46301823 (1:39:59 PM GMT). The last comment was https://news.ycombinator.com/item?id=46301848 (1:41:54 PM GMT).
- There was an average of ~4 seconds per comment just prior to the outage. Based on this, HN likely went down at 1:41:58 PM GMT.
(The reason I did that is that the anti-crawler protections also unfortunately hit some legit users, and we don't want to block legit users. However, it seems that I turned the knobs down too far.)
In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.
I'll add more as we find out more, but it probably won't be till later this afternoon PST.
We all knew that but I haven't seen any confirmation before this.
I think you're confusing popularity with criticality. I'm sure everyone in here can withstand a few hours without browsing the page.
It's dang's baby at this point, and this is a good thing, as long as HN doesn't affect his life in ways he doesn't want.
Get a grip and go touch some grass. Even FANGs understand the concept of business hours SEV.
However, when something I care about crashes and burns once in a blue moon, I make sure to put the fire out, at least to make it survive till regular hours. Things I care about can be both business and personal, and nobody bugs me for them.
Maybe we shouldn't make any assumptions about people we don't personally know, while we are at it.
I hope it doesn't change (much).
So don’t beat yourself up please.
When I worked for “SaaS unicorn” we typically had multiple levels of escalation, and acknowledging would have done nothing because the alarm would continue firing until fixed. Not sure what’s changed in 15 years of ops, I had assumed it would be better now- I can’t imagine silencing an alert totally by acknowledging it- if its still occurring.
I’m totally fine with how you handled it, if anything I am thankful. But that seems to be a system I would improve if I had the time.
“mute” is different than “resolve” to me, and both should exist. (Where mute is an acknowledgement of an issue as ongoing.)
Not to say that I don't procrastinate or waste time doing other nonsense. I can definitely spend a lot of time reading HN comments, as I'm doing right now.
Anyway,anyone who finds themselves with a problem with HN should try that out :)
To be clear, I wasn’t complaining. Just pointing it out. Aside from any more speculative benefit to YC for running the site, the site does run outright ads.
(Might be wise though to have PagerDuty configured to re-alert if the outage persists.)
I'm pretty happy with how it's developing—the trendline is promising—but not ready to rely on it in prod yet.
Enjoy your deserved sleep and if for a couple of hours it's down, so be it.
Thanks for your continued service!
Though I will say, HN is a pretty great source of information about major outages like the recent AWS and Cloudflare issues. I had a moment this morning where I thought, oh, is there a larger issue and then, oh, HN is down, huh, the next option is so far down my list that it's going to take me a moment to think of it.
I hope that serves as a testament to how great this site and the community is. Thanks for all your hard work keeping it that way!
Option 4: take your grab bag with the tcp over IP shortwave radio, sextant and head for pre-cached month supply of food in the hills.
https://downforeveryoneorjustme.com/hacker-news
This website had many instances of reports, the last I saw were 52 reports in only a short frame of time, the maximum reports on this are 118 it seems.
> In this case, though, we had a secondary failure: PagerDuty woke me up at 5:24am, I checked HN and it seemed fine, so I told PagerDuty the problem was resolved. But the problem wasn't resolved - at that point I was just sleeping through it.
Its okay I suppose, have you figured out who is crawling hackernews so much tho, was it a ddos attack or an AI company trying to get data, doesn't hackernews support an api and I am sure that there are datasets for it too so Its interesting why they might crawl but we all know the reasons why as they have been discussed here.
HN is important, but unlikely much harm could be done before morning.
(Source: Lost a lot of sleep at one place, enough to realize that sleep interruption and deficit has significant costs.)
If you browse HN while logged in, that should immunize you against this happening. Also, if it does happen again, you can unban your IP as described at https://news.ycombinator.com/newsfaq.html. But you have to do that from a different IP address, of course.
If those things don't work, email hn@ycombinator.com and we'll get it sorted.
it is a shame that it needs to be this way. as a lurker who doesn't stay logged in nor use incognito mode, i have seen "Sorry" page way too often, even when opening the "past" page from the homepage.
truly hope you find a solution that reduces friction for all. personally, it is back to "Sorry" situation for now.
PS: for others facing a similar situation, it all disappears after logging in, which has been the most reliable solution thus far.
https://news.ycombinator.com/item?id=5229522
Re: traffic, dang said (2022):
https://news.ycombinator.com/item?id=33454140
I took it as a good reminder that the hard part is the human part: that high-overhead features and UI fripperies are nice but not necessary (or sufficient) to keep a community healthy and vibrant over the decades.
(And on the subject of the human side, if you didn’t catch Anna Wiener’s 2019 profile, it’s here:
https://www.newyorker.com/news/letter-from-silicon-valley/th... )
The most interesting number is the 1300 submissions because that hasn't grown since 2011 - it just fluctuates. Everything else has been growing more or less linearly for a long time, which is how we like it.
I find that surprising, as 2011-2022 covers an exponential rise in SEO spam and "growth hackers" attempting to drive traffic and links.
Or was 1,300 the number of non-flagged submissions?
A lot of people out here designing their blogs like its 1989.
There is an official dump which doesn't even require parsing HTML at all: https://console.cloud.google.com/marketplace/details/y-combi...
https://www.youtube.com/watch?v=Sbpl3ywNlpA#t=56s
1. Blame: The first thing to do is to point the finger. That doesn't mean analysing the technical issue, which can delay this step and limit your options, but figuring out who is politically easiest to blame. Often, that's the new guy, but outside contractors and vendors without good connections are also a common solution. Even if you are technically responsible for hiring them, you can always push them under the bus with a little skill. This small sacrifice helps unify, focus, and motivate the rest of the team.
2. Emotion: Inject your emotion into the situation and make that the implicit, but indisputable priority. Particularly, outrage and anger - This is completely _____. These people are utterly _____ (I'd use all caps, but that's not allowed on HN). Make sure everyone's attention is over their shoulder, on your emotion, and infect the team with it. Threats are an effective tool here - this is a crisis, and anyone who is calm is not emotionally engaged. Otherwise, they won't care enough about this problem - without you driving them, they probably wouldn't care much at all. Anyway, you don't have time for niceties like empathy or even basic respect.
3. Speed: Respnsiveness to stakeholders is very important. People need answers now. Give them answers they want to hear, outcomes they will be comfortable with. Don't worry if different groups hear different things. Your team will find a way to make it all work - that's their job.
4. Communication: Good communication is essential. Make sure you clearly tell your team what they should be doing; repeat it several times to prevent misunderstanding. Especially people with experience can have minds of their own; keep them on track. The situation is a crisis so you can't take any risks; stay on top of them and everything they do, and give input if you're not certain they are doing exactly what you would be doing.
5. Victimhood: Find a way to turn the tables: Make it about you, and how you're the victim here, and feed the fire with more outrage. With this and outrage, nobody will undermine the team by challenging your ideas or authority, which is the most essential component of a successful outcome. Remember, without you this all falls apart.
Have I missed anything?
Comprehensiveness: propose extreme, sweeping solutions, such as a lights-out restart of all services, shutting down all incoming requests, and restoring everything to yesterday's backup. This demonstrates that you are ready to address the problem in a maximally comprehensive way. If someone suggests a config change rollback, or a roll-forward patch, ask them why are gambling company time with localized changes, and ask them why are they willing to gamble company time on technical analysis?
Root Cause Analysis Meeting: spend the entire meeting time rehashing the events, pointing fingers and assigning blame. Be sure to mention how the incident could've been over sooner if you just restarted and rolled back every single thing. Be sure to demonstrate out-of-the-box thinking by discussing unrealistic grandiose solutions. When the time is up, run the meeting over by 30 minutes and force all to stay while realistic solution ideas are finally discussed in overtime. This makes it clear to the team that nothing is more important than this incident's RCA--their time surely is not. If someone asks to tap out to pick their kids up after school, remind them that they are making enough money to call them an Uber.
Alerting: be sure to identify anything remotely resembling leading indicators, and add Critical-level wake-you-up alerts with sensitive thresholds for those indicator. Database exceeding 50% CPU? Critical! Filesystem queue length exceeding 5? Critical! Heap usage over 50%? Critical! 100 errors in one minute on a 100000 requests per minute service? Critical! Single log line indicating DNS resolution failure anywhere in the system? Critical! (What if AWS's DNS is down again?) Service requests rate 10% higher than typical peak? Critical! If anyone objects to such critical alerts, ask them why do they want to be responsible for not preventing the next incident?
what type of protections are used on HN? rate-limiting? ip range blacklist?
[1] - rel="nofollow"
Sometimes I could not open the comment section, receiving a blank page with "... We're sorry" or something along these lines when opening from new private window. It works when opening normally.
Logging in on the private window seems to resolve the issue. Can you take a look on this if possible?
Of course, they'd better restore service after they wake up naturally, because I need my HN dose. But it's not worth losing sleep over it.
How does this happen?
Not the person you are asking. Bot operators have an incentive to make crawlers look as much like a human as possible so they do not get blocked. Some of them fail miserably and some nearly succeed. That makes it trivial to accidentally block a real person. I am personally fine with that given I do not pay for this site and have no SLA or contract with it.
Was the blocking returning “Sorry.” instead of any page content? A couple of days ago there was a few hours where when I’d go to HN I could load the main page as a non-logged in user. But if I tried to log in I would get “Sorry.” instead. I also got the sorry message if I tried to click on user profiles of other people and a few other pages.
I am assuming that the reason I could see the front page itself and discussions on posts on the front page is that they were in a shared cache for non-logged in users, but that when I clicked on some pages like some random user pages those were not in cache and hit the origin server and it blocked those with “Sorry.” like it did for log-in attempts.
I also tried to go to the unblock IP page, but that one also returned “Sorry.”
For a while I was scratching my head wondering if I had gotten some malware on one of my computers that was aggressively making requests to HN, and that I had become IP banned because of that. Since I think my actual request rate from browsing and commenting should be pretty average. I read HN a lot, but not that much :p
Later in the day, or the next day, things were back to normal and I could log in again. Presumably after those anti-crawler protections had been relaxed again.
My pager noise: https://www.soundjay.com/transportation/sounds/train-crossin...
That will not only wake the dead, it'll wake me no matter how asleep I am.
I used to work on Motorola Minitor 5 pagers. Looks like they recently released their newest model, the Minitor 7
I wonder if pagers are still used in hospitals? I imagine so
I look after several thousand of these across several hundred paging sites.
They're relatively inexpensive (70 quid or so in quantity) and they last about six weeks on a commonly-available AA battery. The batteries go flat enough to trigger the "low battery" beep at about 3am, for some reason. I don't know why.
There's no messaging involved, although the encoders are capable of sending a text string. The message is "get up and get down to the fire station right now", which generally needs no further explanation. POCSAG is unencrypted, so there would be privacy concerns with sending actual incident information in the clear with it.
While we're on the subject of old tech, until BT finally cut the last of them off, we use dialup modems to control the encoders (not dialup internet, just a hundreds-of-miles serial cable) as a backup, and dot-matrix printers to print out a hardcopy message for the crews to pick up.
All very low-tech. All very fixable. All stays working if you don't mess with it.
https://cascode.co.uk/products/2ar2-and-2ar3/
You wouldn't even need particularly good encryption, you'd just need something adequate to stop casual eavesdropping really - "keep them busy for half an hour" would stop people from sniffing the POCSAG traffic and tweeting it, so that people show up at incidents and hang around filming it on their phones.
This incidentally is what a guy in England got arrested for a few years ago, exactly that. It's perfectly legal to listen to and decode pager messages (or any other radio messages), you're just not allowed to pass them on to people or act upon them, and posting them on twitter and then going round to rubberneck at the ongoing incident very much ticks those boxes. As with so many things in the UK, to paraphrase Aleister Crowley, "Don't Be A Dick shall be the whole of the law".
Try opening HN -> it's down, better check HN to see everyone talking about a major website being down -> Try opening HN -> loop
That was a few hours ago. I'm glad this loop is broken.
"Shit, HN is down! Hm, I wonder if there's anything about it on HN?"
until stack overflow occurs.
/s
On all fairness though, mine is same for the original comment where just pressing n autocompletes it to https://news.ycombinator.com/
https://www.proginosko.com/leechblock/
You'll still open new tabs and go to HN, but you'll be reminded quickly, and every day can be downtime day \o/ (for you, personally)
It's like they say: "Your demons will comfort you when no one else will. That's why it's so hard to get rid of them"
youd go through that effort when you could have just stopped though.
If I could snap my fingers and break toxic habits and patterns, I would have done so decades ago :)
That's so refreshing in terms of being a user-focused feature, and yet it stands in sharp contrast against today's engagement-hyperfocused climate. I never would have thought to look on a website's own settings page to limit my access to that same website.
I love it, thank you for pointing me to this!
You mean it's not your homepage?
I know dang basically works tirelessly to not change the format in order to not induce those addictive patterns
but yet here we all are
It's understandable to be addicted. Lol.
I visit this place multiple times a day.
i cant find the link, but there was a post about how to "be nice" and it was a revelation to a worrying amount of "geniuses" on here. bare in mind the sum total of the advice was "be nice, dont be rude"
2. your characterization of the article sounds uncharitable
3. my point isn't exactly that this is necessarily the smartest place
Almost every (non-troll) online community that is relatively peaceful and has some semblance of moderation to remove flamewars thinks of itself as "the best community". Usually as compared to reddit, though if it's on reddit they will compare themselves to some other (hated) sub.
It's a fact of the internet. Every online community thinks of itself as the smartest, more thoughtful, more civilized. HN is no exception.
It goes without saying HN is not the smartest or more thoughtful online community. It's just... ok. Not the worst, not the best. Certainly NOT the place with the smartest people, though some smart people frequent it. As a regular, you can soon figure out HN's unspoken rules, blindspots, and areas where the group opinion is more likely to be accurate.
How does that go without saying? Name some others then, compare and contrast. As-is your argument is just posturing.
No need, because whether an online community is more thoughtful or smarter than another is very subjective. Almost by definition, HN is not it. Extraordinary claims require extraordinary evidence, and all that. Of course, by internet law, HN (or a subset of its members) considers itself to be the smartest, more thoughtful online community.
There are communities I like better, which are smarter and more thoughtful, but I've no desire to argue with you.
> As-is your argument is just posturing
Nah. Hard pass. Nice try though!
The unsubstantiated claim that "HN is the smartest place on the internet" is an extraordinary claim requiring extraordinary evidence, which wasn't provided.
The downvotes only prove my point.
But also, people like me. Be careful what you choose to believe on this website
This was especially obvious during Covid, I even stopped visiting because the comment section was so crazy.
Nice joke!
At least, I hope it was a joke...
... but I still cannot tell if the original commenter was sarcastic or not! ;)
Did it like 5 times during that 1h-ish outage. :(
https://x.com/HNStatus
Is there a better place to check, beyond a basic down detector that may provide more insight or signal that the outage is acknowledged?
(Basically whenever you see an x.com link just change it to xcancel.com and avoid the nonsense.)
Seems to reset it on the web view, too.
I didn't read the post text, it's identified there haha, my bad! I wish the text post text wasn't grey, I gloss over it too easily.
We commonly run into finance issues about half way through the year. We get to the point where 10x HDMI cables get declined from Finance and we get reprimanded for not tracking where each HDMI or Ethernet Cable go. Near to the end of the year, the budget refreshes and finance (without consultation to us) ends up buying a bunch of random stuff.
"Guys, we brought 11 iPad Minis that need to be setup"
Oh so we can also get the HDMI cables now?
"No sorry, we just spent all the remaining money, have you audited the cables recently?"
https://hackernews.onlineornot.com/incidents/yaz-eOJeARBL
https://downforeveryoneorjustme.com/hacker-news
Strangely, nothing from the statuspal, which is the first google result
https://hacker-news.statuspal.io/
on edit: ok others pointed out it was cached pages I saw. explains it.
I suppose you could also just clear your HN cookies in regular browsing window, but then when they fix it you'd have to log in again.
https://x.com/paulg/status/1953289830982664236?s=46
I believe it's because they accept user reports.
It's not that much different from HN, come to think of it.
(ha, ha)
It's down about 8.4 minutes per week. On 26% of days it doesn't work at least once, and on 12% of days it has more than one consecutive failed check. The longest uptime streak was 24 days
I've been keeping track since exactly 2 years (to the day!) because I was surprised that it seemed briefly down for me on a daily basis. Was I getting unlucky and hitting it every time, or was it just down very often? Nobody posted anything so I started answering the question for myself :p
I've been meaning to post the tracker to HN but there's a pesky bug I want to fix: the "is it currently down" stat. I don't know how this is beyond me but something in the code bugs out. So this is my first time posting about it
That's not so useful when news.ycombinator.com is having problems.
Maybe ycombinator does have an official status page somewhere, but it is not easy to find if that is the case.
It did work without being logged on. The auth service appeared to be down as the log in attempt (just showing the page) failed.
I'm sure it's a coincidence but it started working again shortly after emailing hn@ycombinator.com
I'm still impressed nonetheless.
I'd like to know what caused the outage and how it could have been prevented, for learning purposes.
You can just look at them, turn on showdead in your profile and you'll see a bunch of flag-killed comments in this discussion by whatevermrfukz. No need for a plugin or scraper.
Anyway, glad to see you back.
Paris 1812.
Cheers from France.
HN was down about an hour ago.
Glad to see it back !
Cheers.
Working with full dates in the HTML and doing a tiny JavaScript that calculates the "minutes ago" would actually be a neat improvement.
After more than an hour I thought, "wow this is pretty harsh" and "so much of my exposure to learning things is directly tied to HN posts". I was lost lol.
We've banned this account for repeatedly breaking the site guidelines and ignoring our requests to stop.
If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.
Being "voted to -2" doesn't necessarily mean you were wrong (it often correlates though). People might just think it wasn't relevant in whatever context you posted it in
I often find it hard to tell what makes people think something I write is not helpful (or sometimes also a comment someone else made) and thus appreciate comments that clarify constructively. It can also help to ask for clarification if you're particularly surprised about the votes on a given post
That’s not what happens in practice.