Wiki/Articles/The_Complicated_Cloud.md

148 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

I will preface this article by affirming my commitment to the [AniNIX Mission Statement](/AniNIX/Wiki/src/branch/main/Policies/User_Ethics.md#our-mission-statement). We are and will always be, predominantly, a self-hosted ecosystem. The cloud is popular now, but so was outsourcing IT once upon a time. Neither is going away, but neither is self-hosting. It will always be a debate between cost, expertise, control, adoption, and a host of other factors whether organizations will source their personnel and resources internally or subcontract -- technology isn't much different from the rest of economics in that respect. However, the AniNIX is a firm believer in keeping both our resources and personnel in-house, and that remains the most secure, convenient, and cost-effective for our model.
# Why Self-hosted
It seems to be becoming increasingly necessary to justify why we are self-hosting our solutions. The cloud wants to claim that it reduces time spent on maintaining infrastructure, freeing developers to work on more interesting things. It also promises safety and reliability of these through SLA's, licenses, and disclaimers. Fundamentally, though, the cloud is actually an acronym.
```
Computers
Living
On
Unknown
Datacenter
```
There's a reason why persons like Edward Snowden use strong personally-encrypted devices to hold their data -- once someone gets physical access to hardware, hardware attacks like cable taps, drive cloning, [microphone-based PGP decryption](https://www.welivesecurity.com/2015/06/23/pgp-encryption-keys-pita/), etc. can invalidate a lot of technical protections. FISA warrants and National Security Letters may allow government bodies to effect this exploits on your hardware without your knowledge, even when you may not be the intended target, and organizations the size of the AniNIX will not drive any protest from the hosting companies. Without blockchain-style validation & encryption of data (which is presently infeasible for services like AniNIX/Yggdrasil or other media platforms), this brings a lot of insecurity.
We prefer to protect our users and services directly, from the physical layer up, so that we can ensure the best experience and best security possible. We know all actions at all layers and have the ability to inspect those layers at all times.
# The Cost of the Cloud
To compare the pricing of the cloud against self-hosting, we're looking for the following model: 2 16-core, 16G RAM, 4T storage instances.
* [Monthly cost for AWS](https://calculator.aws/#/createCalculator/EC2DedicatedHosts?nc2=pr) with Cold HDD & Instance Savings options: $4,625 annually becomes $385 monthly for a one-year commitment.
* [Monthly cost for Azure](https://azure.microsoft.com/en-us/pricing/calculator/) with 2 F16s v2 & Standard HDD S50 storage: $774.45 monthly on a three-year commitment.
For reference, we paid about $500 for our SuperMicro X8's and these have lasted us two years with no signs of slowing down. Storage costs $60 for 4TB drives right now, and since we use eight of those, that's another $480. Even generously assuming a $70 a month power cost and $500 for additional hardware (switches, HDD cages, Raspberry PI's), our initial annual cost is only $2300 & $840 annually after that, a fraction of the cost of cloud services. Don't buy into the hype that the cloud is cheaper.
# Necessary Cloud Services
We have a number of necessary cloud services, but all of these are accepted because we choose to interact with the outside world. We are operating as a business at least in some capacity -- while our services and code are open-source and free to use, we do have staff that sell their time and expertise to the world, to help offset the cost of operating and for resume-building.
## Freshping & Discord: An Example of the Necessity of the Cloud
As a small self-hosted site with only a single replica, we don't have much of an option for monitoring to make sure that the outside world can see us. If we weren't providing services to folks living off-site, we wouldn't need this service. But, we do try to be available to the world.
As such, we need to validate that the world can route to us. We use a service called [Freshping][freshping] to monitor us. Freshping emails our admin reports, and it also can trigger webhooks. We haven't had much success linking Freshping to [Discord][discord] directly, so we use [Zapier][zapier] as an intermediary. Discord is the notification service of choice, because if we're off-site and our site is offline, then there'll be no notification and we cannot escalate to our ISP.
Discord serves an ancilliary function for us, as well. Discord is our primary telephony solution for making video and voice calls within the network. While there are services we could host to do this, voice prints are necessarily identifying and can be falsified -- IRC identity validation by authentication and GPG-encrypting messages is more secure and anonymous. Moreover, having our telephony solution remote also allows our admins to fall back to it when the network is offline to discuss root-cause and remediation. We maintain a [FreeConferenceCall.com][fcf] account and a [Google Voice][voice] number for more traditional phone systems, but we rarely use them because of poor spam and authentication.
## Google: Communicating with the outside World
We use Google for a few things.
* Initial site analytics & external validation of our web development practices via [Google Analytics][analytics].
* Domain DNS via [Google Domains][domain].
* Direct phone number via Google Voice.
* We are considering using Google Workspace in [AniNIX/Wiki#8](/AniNIX/Wiki/issues/8) for our SMTP presence, even though we don't like email. As we look toward being a contract organization, being able to interact with business people on their level is an unfortunate necessity.
Effectively, Google services here are handling all the legacy cruft for us in dealing with the external world. These services are typically more difficult to secure, though they are more familiar to average users.
## Venmo: Payment and PCI compliance
[PCI compliance](https://www.pcisecuritystandards.org/pci_security/completing_self_assessment) is a necessary part of doing business within the US. This is presently more impactful for our [martial arts](/martialarts) division than the tech one, but it's still necessary to support. We host links to PCI sites, so we have to annually review a self-assessment, but our obligations are limited. It would be possible for us to develop a complete payment portal against a banking institution ourselves, but because we are not a bank, we'd still be dependent on that bank's cloud services and API's. Such development would also make us liable for more expenses in needing to hire a PCI auditor and other overhead we simply cannot afford. As such, we offload our payment system by linking out to [Venmo][venmo] which directs payment into our bank.
We are investigating using a USDCoin wallet to offer operating on the blockchain, but that is still a weird middle ground of self-hosting and cloud all at the same time, being a peer-to-peer protocol. One could argue that running a miner for that protocol would make it somewhat self-hosted and that we are simply participating in the protocol with a much wider audience in the same way that providing an RSS feed puts us in the conglomeration of information provided by RSS. However, adoption for this is still low and more traditional banking will likely dominate any business ventures in the near future.
# Replicating from Self-hosted to the Cloud
Business needs aside, we have to recognize that we are not an island. Getting new people involved in the network isn't easy, and the more setup is required to use our services, the slower that involvement becomes. As such, we are looking at using some mirroring services to provide visibility from our self-hosted services to major platforms. Not everything will be available on these platforms, but we will start building brand recognition that may draw folks to the network.
## Source Code: Foundation to GitHub via SSH deploy keys
Source code is wonderfully naturally distributable with Git, and we can use native repo mirroring to push select repos from AniNIX/Foundation to [GitHub][github]. Since GitHub is used as a gold standard for searching for developers, this lets us put some samples we are otherwise freely distributing to the world onto this platform and show other developers what we can do.
First, we create an [SSH key](https://man.archlinux.org/man/ssh-keygen.1.en) that will be added to individual project as a deploy key. We'll need a configuration that will pair this identity to the project, and each project will need a unique identity. This mapping gets stored in the Gitea user's [ssh_config](https://man.archlinux.org/man/ssh_config.5.en).
```
/var/lib/gitea/.ssh/config
--------------------------
Host cryptoworkbench
HostName github.com
User git
IdentityFile /var/lib/gitea/.ssh/cryptoworkbench
```
Then, we add a post-receive hook through the Gitea UI to mirror the repo to GitHub anytime the local repo receives a commit.
```
git -C /var/lib/gitea/repos/aninix/cryptoworkbench.git/ push --mirror --repo=cryptoworkbench:AniNIX/CryptoWorkbench.git
```
That's it! Already publicly-accessible repos are now publicly accessible on GitHub as well with no additional maintenance needed. To force that interaction happens in AniNIX/Foundation, one could turn off issues and pull requests in GitHub as well, but we haven't seen a need yet.
## Chatting: IRC to Discord via discord-irc
Communication being private is a much more complicated issue than source code. As a server, AniNIX/IRC doesn't log user traffic, to protect user privacy. We do use [discord-irc](https://github.com/reactiflux/discord-irc.git), a project from reactiflux, to proxy some of our public channels to Discord. These are curated very specifically to protect our userbase, but channels like #lobby are accessible (and even enforced) for every user that connects to the network. Doing this lets users who are familiar with phone apps but not IRC to start interacting with our network so that they can be taught.
## Social Media Posts: RSS to Platforms
We use three of our free Zapier "zaps" to talk to [Facebook][facebook] and [LinkedIn][linkedin]. We know that these platforms have more significant following than our individual site, so to make ourselves searchable and prevent imposters we hold onto some social media accounts. We have definitely seen better advertising for our martial arts program through [their FaceBook][facebook-ama] and [their YouTube][youtube-ama] accounts than through our site. However, managing each account would be tedious and time-consuming for our admins -- no one wants to deal with all that busy work, and all of our content is going up on our RSS feeds to feed into AniNIX/Singularity anyway. By pushing RSS updates into
Because we're limited on the number of free zaps, we use [MonitoRSS][monitorss] to push RSS updates from ourselves and from others into our Discord servers. This retroactively pulls our recorded [YouTube][youtube] broadcasts back into IRC via the Discord bridge, and it makes our RSS posts available on Discord for those that aren't as familiar with the technology.
### Making RSS into just Git
We didn't want RSS to become another thing we had to manage, so we moved it into Git. The below snippet for OpenResty or Nginx will allow exposing AniNIX/Foundation (Gitea) raw files off the main branch directly to the webserver, rather than having these files be flat files in the Gitea `custom` path.
```
location /aninix.xml {
proxy_hide_header Content-Type;
add_header content-type "application/atom+xml";
rewrite /aninix.xml /AniNIX/Wiki/raw/branch/main/rss/aninix.xml;
}
location /martialarts/maqotw.xml {
proxy_hide_header Content-Type;
add_header content-type "application/atom+xml";
rewrite /martialarts/maqotw.xml /AniNIX/Wiki/raw/branch/main/rss/maqotw.xml;
}
```
This makes it really simple for admins to push content to the RSS without needing to modify webserver files specifically independent of other contribution workflows. It also allows us to depend on Gitea and git branching to test feeds before they are published to the world without needing complicated extra setup. The one downside here is that we have not seen OpenResty pick up the new content-type header yet, but we're hoping a future release will improve those directives.
## The Potential Disaster: Fitness trackers solutions to Strava to Zapier to Discord to AniNIX/IRC...
In an even more convoluted mess of interlaced cloud solutions, our martial-arts group also uses cloud solutions to solve some problems. Here's what that pipeline looks like:
1. Users buy a variety of fitness trackers, which report to their own cloud instances.
1. These cloud instances can talk to [Strava][strava] to file their results.
1. We use our last Zapier "zap" to repost from Strava to Discord.
1. The Discord channel in which this all is posted, #martialarts-workouts, gets proxied back to IRC's #maworkouts channel.
This means that, to get live fitness data from users to our IRC, we are using a minimum of 4 cloud services and 2 in-house services. A lot of points of failure exist in this pipeline, but since it's convenience only, we accept the impact.
# Imperial Intelligence
We run a [SWTOR][swtor] guild as well, called [Imp›rial Int›lligence](/impintel). While we are mostly able to self-host the services we need for that, we end up publishing the GitHub & Discord addresses primarily to our guildmates. This is convenient, because though we have the tools to drive both solutions from on-site, the cloud offering abstracts our identity away to avoid gaming harassment. While our GPG key and other factors offer some breadcrumbs back home, our hardened perimeter and security posture make this unconcerning. Since the guild is themed on intelligence agencies, it makes for an interesting exercise to see if any of our "agents" will find their way home. So far, none have. Maybe you will be the first to know the secret?
# Conclusion
Self-hosting is still the best route, we believe, for your organization to control its data. Integrating publicly-accessible information from self-hosted solutions to the cloud can increase exposure and adoption without impinging on security or work effort. Sometimes, cloud solutions can even solve specific problems pure self-hosting can't. What you will use will be up to you -- hopefully, we can show you how the benefits of self-hosting. Namely, that even when you interact with cloud services, you can end up managing almost all of your cloud platforms via Git and IRC, with the right pipeline of integrations. This gets you the best of both worlds -- the visibility of cloud platforms with the security & control of self-hosting.
<!-- All our cloud services, so that we can keep track. -->
[fcf]: https://www.freeconferencecall.com/login
[analytics]: https://analytics.google.com
[domain]: https://domains.google.com
[voice]: https://voice.google.com
[stripe]: https://venmo.com
[freshping]: https://aninix.freshping.io/
[zapier]: https://zapier.com
[discord]: https://discord.gg/2bmggfR
[strava]: https://www.strava.com/clubs/aninixmartialarts
[github]: https://github.com/AniNIX
[facebook]: https://facebook.com/aninixnetwork
[facebook-ama]: https://www.facebook.com/groups/aninixmartialarts/
[linkedin]: https://www.linkedin.com/company/aninix/?viewAsMember=true
[youtube]: https://www.youtube.com/channel/UCe-WNM2mbI51xoVZp3K_wFQ/about?view_as=subscriber
[youtube-ama]: https://www.youtube.com/channel/UCVAkee-WaInnZbPn16bqzrw/about?view_as=subscriber
[monitorss]: https://monitorss.xyz/
[swtor]: https://swtor.com