-
@ pleblee
2024-01-28 20:53:17There's no "right" way to run a relay, any relay operator is free to make their own decisions. I like nostr's simplicity and tendency to favor user independence and choice. I want to see nostr grow and remain reslient, so that's why I have been sharing tools to keep small relays running in tip-top shape.
The is the first part in a series. The follow up should explain the workings of our friend-of-friend trust model.
strfry is fast
bitcoiner.social is one of the oldest nostr relays. In the early days, it ran a reference relay implementation called Relayer. It crashed a lot, but a simple systemd restart policy kept it alive and mostly functional for the first year. Somewhere around November I switched it over to the rust-rs-relay, which was stable and used a simple sqlite backend. That lasted a matter of weeks before the usage rose enough to warrant changing back to a postgre backend with nostream. Nostr was growing fast!
By the outset of 2023, the relay was still being completely overwhelmed. Each request coming in would cause postgresql to gobble up compute cycles. Any amount of tuning on parameters was a futile effort. The demand was unsustainable and the best way to remedy the situation wasn't immediately clear.
Then, I think in early February, I saw a post from jb55 that he was trying a new relay written in C++ called strfry. A glance at github revealed that strfry was using a new kind of backend storage method called LMDB. The difference in compute effeciency was night and day: where SQL queries were maxing all server cores, our CPU charts showed that strfry sipped compute cycles. Even a small virtual machine could handle thousands of requests. It reminds me of switching from an old IIS or Apache webserver over to nginx. We continue to run strfry to this day. And most of my relay tooling tends to be focused around it.
ansible is simple
Last year I started an Ansible Role for installing strfry. It enabled me to commit the entire process of setting up the relay into a repository, ala Infrastructure as Code. This ensured that I'd be able to quickly restore the relay in the event of a catastrohpic loss. And I wouldn't forget how to set the thing up. Besides building strfry from source, it automates the basic setup steps needed to use an excellent plugin by Alex Gleason called strfry-policies. I also started an Ansible role for snort, a nostr web client.
Smart people will tell you that Ansible is far from being the best way to manage relay infrastructure, but it is plenty good enough to get the job done. I'd like to do a nix implementation at some point, when time permits. But it's a low priority so I'm hoping someone else will beat me to it.
One of my first projects when I was learning Ansible was setting up an observability stack. It's entirely necessary to monitor the availability of a nostr relay, and running my own monitoring stack has been one of my most useful preperations for being a relay admin. I haven't taken the time to put together a proper example, but all of my infra code is FOSS so you can see it in my stack playbook and host vars. I even proposed to do a whole talk on this at Tabconf 2023 (although I'm glad they turned me down).
If you're wanting to set up your own first observability stack, it might be easiest to start with a few containers (e.g. awesome-compose/prometheus-grafana).
rules are necessary
Spam and abuse is prevelant on nostr, so event ingestion and retention are two primary concerns for a relay operator.
ingest and digest
To address the firehose of notes coming in from the internet (ingestion) we use the strfry-policies plugin. The plugin allows the admin to define policy rules to prevent unwanted events from being ingested on the relay. So far I've added two policy types, one filters by event kind and another filters for the size of the event.
I'll document it more in a follow-up article, but I implemented policy exclusions for friends, and friends-of-friends (foaf). I recently published a simple cli tool for reading relay reports. Reviewing reports is necessary to deal with whatever comes through the cracks of our policies. The admin can use the reports to ban abusive accounts, and delete their content from the relay.
to retain, or not retain
To address data retention, I wrote a simple Deno Typescript tool to prune events. strfry itself has a brute delete function, but every relay will want to keep certain kinds of events longer than others (such as profiles or contact lists). I haven't published it anywhere yet, but it basically looks like this:
```typescript import { readLines } from 'https://deno.land/std@0.201.0/io/mod.ts';
const kindAgeLimitDefault = 90; const kindAgeLimits: Record
= { 0: 365, // Profiles 3: 365, // Contacts 9735: 180, // Zap Receipts 24133: 365, // NWC 10002: 730, // NIP-65 }; const exportEventsStdin = async (): Promise
=> { if (Deno.isatty(Deno.stdin.rid)) { Deno.exit(1); } for await (const line of readLines(Deno.stdin)) { if (line.length === 0) { return; } exportLine(line); } }; const exportLine = (line: string): void => { const eventJson = JSON.parse(line); const created_at = new Date(eventJson.created_at * 1000); // convert seconds to ms const kindAgeLimit = (kindAgeLimits[eventJson.kind] || kindAgeLimitDefault) * 24 * 60 * 60 * 1000; // convert days to ms const ageLimitDate = new Date(Date.now() - kindAgeLimit); if (created_at > ageLimitDate) { console.log(line); } };
await exportEventsStdin(); ```
The bitcoiner.social relay uses a modified version that takes into account a friend-of-a-friend model. We retain events from friends of our friends much longer than we keep the notes of total strangers. One side-effect of pruning events is that it reduces the compute load on the server. Having a policy like that can generally keep the relay at a sustainable level for long periods of time. I have a systemd timer (cronjob) that runs the policy on a regular schedule.
We run
strfry compact
every week to keep the database index orderly and effecient. These are the kind of optimizations that keep a small relay in a healthy state for the long-term.