social.coop down (June 6, 2018)
This is probably apparent to others, but wanted to start a thread about social.coop being down. This can be a space for admins to post updates, if that's helpful.
[Edit]
First point of rendezvous for those on the tech team investigating these issues is our matrix chat room. You may want to visit there first, as this thread is less likely to be updated.
Matt Noyes Sat 11 Aug 2018 2:36AM
Still down.
Clayton ([email protected]) Sat 11 Aug 2018 1:24PM
Any update on this most recent outage?
Nick S Sat 11 Aug 2018 2:49PM
Looks like it's back up? Did anyone do something to make that happen?
As a general policy, I suggest reporters and fixers rendezvous here to coordinate:
https://riot.im/app/#/room/#SocialCoop:matrix.org
I've created an issue ticket on our GitLab instance to remind me to document this, and I'll add something at the top of this and the issues thread to point people at it.
Leo Sammallahti Fri 31 Aug 2018 6:03PM
Is social coop down for others?
Deleted account Fri 31 Aug 2018 6:04PM
Yep.
Matt Noyes Fri 31 Aug 2018 6:26PM
Yep. Any ideas?
Nick S Fri 31 Aug 2018 6:56PM
Database/mastodon had twisted knickers, I supplied a fresh pair. Looks like it's back to work.
Josef Davies-Coates Thu 6 Sep 2018 1:31PM
social.coop is down for me right now... is that intentional? if so, why? thanks.
Nick S Thu 6 Sep 2018 2:41PM
It's down. Post upgrade teething troubles, victor was looking at it but the clock ran out and he will have to revisit it later.
I've updated the thread title to point everyone at our channel on riot.im, where you're likely to get more up-to-date information. At some point in the future we should have a status page.
Noah Thu 6 Sep 2018 2:50PM
we have one but it's not regularly updated, and when i spent a few minutes looking into it, it wasn't immediately clear how one might go about updating it. anyway it's at https://status.social.coop - slightly more info in the git.coop infrastructure doc in Section B, part 3.
Nick S Thu 6 Sep 2018 2:57PM
Yes... I didn't count that as it has to be updated manually (how I don't know), and I don't think it runs on our servers, so it is subject to termination.
Josef Davies-Coates Thu 6 Sep 2018 3:00PM
OK thanks @wulee
Noah Thu 6 Sep 2018 2:17PM
Seems like lately (maybe post-upgrade?) there's pretty regularly downtime in the morning here (EDT).
Nick S Mon 8 Oct 2018 12:23PM
Hi all. We (@nicksellen and I) are now about to switch social.coop's media object storage provider over. (Ticket on git.coop, for those with an account there, is https://git.coop/social.coop/tech/operations/issues/21)
So the site will be offline for a very short time. We hope you won't notice any missing images when we come back, but if you do, this is why. Note: we've made sure all the social.coop images are safe, it should only be cached remote media which may require restoring.
If you have problems please contact us on our chat channel
https://riot.im/app/#/room/#SocialCoop:matrix.org.
Thanks!
Nick S Mon 8 Oct 2018 3:43PM
Ok, this is essentially done. Some media files from remote sites will appear to be missing because Mastodon still thinks they're cached in our content storage but they're not. However think we can clear the cache with some magic Masto incantations, or failing that, database hacking.
Meanwhile, we're also thinking about the next step, which is to migrate our instance to our new server. If we get the chops to do that today, we may defer monkeying with the cache until after that, because clearing the cache could take a while to run based on previous experiments.
Nick Sellen Mon 8 Oct 2018 6:45PM
Yup, we will attempt the server migration later tonight too, in about an hour. It's good to do it whilst we have our newly found mastodon-fu loaded into our brains.
After we're happy with the stability of the new deployment we can spend a bit more time on documentation and communication of what has been happened on the technical infrastructure front.
Nick Sellen Tue 9 Oct 2018 1:48AM
The migration is complete! Let us know if you see anything a bit wonky still.
There's a bunch more work to do tidying up, etc. sort out proper backups. Tasks are listed at https://git.coop/social.coop/tech/operations/issues in some kind of order. We'll probably head to sleep first before putting all that in order.
Bob Haugen Thu 11 Oct 2018 11:22AM
Had to clear cache to get it to work again. But now it seems back up and running. THanks again for all your hard work!
Matt Noyes Mon 8 Oct 2018 2:41PM
Thanks for your work!
mike_hales Tue 9 Oct 2018 8:31AM
Kudos :astonished: Please . . identify the all ops volunteer workers here, so we non-gitters can see some of the hidden reality that makes the lights come on when we flick the switch? @wulee @nicksellen AN Others? Thank u :clap:
Just out of anthropologist-interest . . where literally is the new server (s? backup?). And the object storage? Ultimately, I believe it's important to understand the materiality and geography and indebtedness of this magic infrastructure stuff. Like massive Amazon S3 server farms, transatlantic cable, etc.
Nick Sellen Tue 9 Oct 2018 10:08AM
The server is in Helsinki. It's a Hetzner server.
You can also find this information for any website:
- from social.coop domain --> command ping social.coop
--> ip address is 95.216.13.24
- geoip (e.g. with https://www.maxmind.com/en/geoip-demo) ---> says Finland, plus some approx co-ordinates --> google maps search: 60.1708, 24.9375 --> Helskini!
The object storage is where all the user uploaded files get saved:
- user avatars and header images
- media attachments (mostly uploaded images)
- link preview images
- temporarily storage of imports (if you do a bulk import)
By default mastodon would put them on the local filesystem on the server, but it's handy to put them on a remote storage service, so you can move the server without moving the files. We use DigitalOcean Spaces now (was previously Dreamhost DreamObjects). We chose the Amsterdam location.
It was just co-incidence that we had to move the object storage at the same time as the server - Dreamhost were shutting down the east coast USA service and required all the users of the service to move it themselves to their west coast USA service, or entirely migrate to another platform (which we opted for, as they were stored under Victors companies account).
People involved were: mostly @wulee, he really took responsibility for the task. I have been taking a supporting role, boosting morale, someone to bounce ideas off, to help investigating issues that come up, sharing some of the tasks. @victormatekole helped with the object storage migration, and @mayel supported us by extending the migration deadline and sharing access to the things we needed.
Bob Haugen Tue 9 Oct 2018 10:35AM
My quote of the day:
it's important to understand the materiality and geography and indebtedness of this magic infrastructure stuff!
Dave V. ND9JR Mon 15 Oct 2018 11:00AM
I'm still getting a "502 - bad gateway" message when I try to use social.coop via the web and have been for over a week now. Using it via Mastalab on my Android tablet seems to work.
Bob Haugen Mon 15 Oct 2018 11:20AM
Working fine for me in Firefox but I had to clear my cache
Dave V. ND9JR Mon 15 Oct 2018 12:28PM
It works fine in Firefox Mobile on my tablet. However in SeaMonkey on my desktop I still get the 502 error even after cleaning my cache (and turning off proxies). I'm using SeaMonkey version 2.49.3.
Gil Scott Fitzgerald · Sat 11 Aug 2018 2:03AM
Seeing a 502 Bad Gateway error at 2103 central time