Anubis - Weighs the soul of incoming HTTP requests using proof-of-work to stop AI crawlers
(github.com)
from zutto@lemmy.fedi.zutto.fi to selfhosted@lemmy.world on 21 Mar 16:53
https://lemmy.fedi.zutto.fi/post/269017
from zutto@lemmy.fedi.zutto.fi to selfhosted@lemmy.world on 21 Mar 16:53
https://lemmy.fedi.zutto.fi/post/269017
I just started using this myself, seems pretty great so far!
Clearly doesn’t stop all AI crawlers, but a significantly large chunk of them.
threaded - newest
Meaning it wastes time and power such that it gets expensive on a large scale? Or does it mine crypto?
Yes, Anubis uses proof of work, like some cryptocurrencies do as well, to slow down/mitigate mass scale crawling by making them do expensive computation.
lemmy.world/post/27101209 has a great article attached to it about this.
–
Edit: Just to be clear, this doesn’t mine any cryptos, just uses same idea for slowing down the requests.
And, yet, the same people here lauding this for intentionally burning energy will turn around and spew vitriol at cryptocurrencies which are reviled for doing exactly the same thing.
Proof of work contributes to global warming. The only functional, IRL, difference between this and crypto mining is that this doesn’t generate digital currency.
There are a very few POW systems that do good, like BOINC, which is a POW system that awards points for work done; the work is science, protein analysis, SETI searches, that sort of thing. The work itself is valuable and needs doing; they found a way to make the POW constructive. But just causing a visitor to use more electricity to “stick it” to crawlers is not ethically better than crypto mining.
Just be aware of the hypocrisy.
the functional difference is that this does it once. you could just as well accuse git of being a major contributor to global warming.
hash algorithms are useful. running billions of them to make monopoly money is not.
Which party of git performs proof-of-work? Specifically, intentionally inefficient algorithms whose output is thrown away?
the hashing part? it’s the same algo as here.
That’s not proof of work, though.
git is performing hashes to generate identifiers for versions of files so it can tell when they changed. It’s like moving rocks to build a house.
Proof of work is moving rocks from one pile to another and back again, for the only purpose of taking up your time all day.
okay, git using the same algorithm may have been a bad example. let’s go with video games then. the energy usage for the fraction of a second it takes for the anubis challenge-response dance to complete, even on phones, is literally nothing compared to playing minecraft for a minute.
if you’re mining, you do billions of cycles of sha256 calculations a second for hours every day. anubis does maybe 1000, once, if you’re unlucky. the method of “verification” is the wrong thing to be upset at, especially since it can be changed
Oh, god, yes. Video games waste vast amounts of energy while producing nothing of value. For sufficient definitions of “value,” of course. Is entertainment valuable? Is art? Does fiction really provide any true value?
POW’s only product is proving that you did some task. The fact that it’s energy expensive and produces nothing of value except the verifiable fact that the work was done, is the difference.
Using the video game example: the difference is the energy burned by the GPU while you were playing and enjoying yourself; cycles were burned, but in addition to doing the rendering there was additional value - for you - in entertainment. POW is like leaving your game running in demo mode with the monitor off. It’s doing the same work, only there’s no product.
This point is important to me. Cryptocurrencies aren’t inherently bad, IMO; there are cryptocurrencies based on Proof of Stake, which have less environmental impact than your video game. And there’s BOINC, where work is being done, but the results of the work are valuable scientific calculations - it’s not just moving rocks from one pile to another and back again.
in the case of anubis one could argue that the goal is to save energy. if too much energy is being spent by crawlers they might be configured to auto-skip anubis-protected sites to save money.
also, i’d say the tech behind crypto is interesting but that it should never have been used in a monetary context. proof of stake doesn’t help there, since it also facilitates consolidation of capital.
I think decentralized currency is the best part of crypto. Much of US strong-arm policy has been through leveraging control over the dollar? Remember a few years ago when OPEC were making noises about maybe tying oil prices to something other than the dollar? The US government has a collective shit fit, and although I never heard it reported how the issue was resolved, but it stopped being news and oil is still tied to the dollar. It’s probably one of the reasons why the Saudis were about to kidnap, torture, and murder of Jamal Kashogi in the US.
I am 100% in support of a currency that is not solely controlled by one group or State. For all of its terrible contribution to global warming, Bitcoin has proven resistant to an influential minority (e.g. Segwit2x) forcing changes over the wishes of the community. I especially like anything that scares bankers, and usury scabs.
Satoshi made two unfortunate design choices with Bitcoin: he based it on proof of work, which in hindsight was an ecological disaster; and he didn’t seize the opportunity to build in depreciation, a-la Freigeld, which addresses many problems in capitalism.
We’re all on Lemmy because we’re advocates of decentralization. Most of Lemmy opposes authoritarianism. How does that square with being opposed to a decentralized monetary system? Why are “dollars” any more real than cryptocoins? Why does gold have such an absurdly high value?
this reads like a mad rant.
first of all, bitcoin in its original form was meant to be used as a transaction log between banks. it was never meant to be a currency on its own, which can be seen in the fact that efforts in scaling up to more than a few million users consistently fail.
in practice, all cryptocurrencies result in a centralisation of power by default, whether they use proof of work or proof of stake, because they are built so that people with more resources outside the network can more easily get sway over the system. by either simply buying more hardware than anyone else (for pow) or pooling more of the limited resource (for pos) they can control the entire thing.
cryptocurrencies are a libertarian solution to the problem of capitalism, which is to say, a non-solution. the actual solution is to limit the use of financial incentives. i’d wager most people on lemmy would rather abolish currency altogether than go to crypto.
It’s a rant, for sure
Satoshi Nakamoto, they guy who invented Bitcoin, was motivated by a desire to circumvent banks. Bitcoin is the exact opposite of what you claim:
www.bitcoin.com/satoshi-archive/whitepaper/
My comment is a rant, because I constantly see these strongly held opinions about systems by people who not only know nothing about the topic, but who believe utterly false things.
Ok, now I have to wonder if you’re just trolling.
Bitcoin, in particular, has proven to be resilient against such takeovers. They’ve been attempted in the past several times, and successfully resisted.
you are framing a fundamental issue with the system as a positive, which is confusingly common for crypto advocates.
i’m not interested in this conversation.
Proof of work is just that, proof that it did work. What work it’s doing isn’t defined by that definition. Git doesn’t ask for proof, but it does do work. Presumably the proof part isn’t the thing you have an issue with. I agree it sucks that this isn’t being used to do something constructive, but as long as it’s kept to a minimum in user time scales, it shouldn’t be a big deal.
Crypto currencies are an issue because they do the work continuously, 24/7. This is a one-time operation per view (I assume per view and not once ever), which with human input times isn’t going to be much. AI garbage does consume massive amounts of power though, so damaging those is beneficial.
I’m not sure where you’re going with the git simile. Git isn’t performing any proof of work, at all. By definition, Proof of Work is that “one party (the prover) proves to others (the verifiers) that a certain amount of a specific computational effort has been expended.” The amount of computational power used to generate hashes for git is utterly irrelevant to its function. It doesn’t care how many cycles are used to generate a hash; therefore it’s in no way proof of work.
This solution is designed to cost scrapers money; it does this by causing them to burn extra electricity. Unless it’s at scale, unless it costs them, unless it has an impact, it’s not going to deter them. And if it does impact them, then it’s also impacting the environment. It’s like having a door-to-door salesman come to your door and intentionally making them wait while their car is running, and there cackling because you made them burn some extra gas, which cost than some pennies and also dumped extra carbon monoxide into the atmosphere.
Compare this to endlessh. It also wastes hacker’s time, but only because it just responds very slowly with and endless stream of header characters. It’s making them wait, only they’re not running their car while they’re waiting. It doesn’t require the caller to perform an expensive computation which, in the end, is harmful to more than just the scraper.
Let me make sure I understand you: AI is bad because it uses energy, so the solution is to make them use even more energy? And this benefits the environment how?
I’m not the person who brought git up. I was just stating that work is work. Sure, git is doing something useful with it. This is arguably useful without the work itself being important. Work is the thing you’re complaining about, not the proof.
Yeah, but the effect it has on legitimate usage is trivial. It’s a cost to illegitimate scrapers. Them not paying this cost also has an impact on the environment. In fact, this theoretically doesn’t. They’ll spend the same time scraping either way. This way they get delayed and don’t gather anything useful for more time.
To use your salesman analogy, it’s similar to that, except their car is going to be running regardless. It just prevents them from reaching as many houses. They’re going to go to as many as possible. If you can stall them then they use the same amount of gas, they just reach fewer houses.
This is probably wrong, because you’re using the salesman idea. Computers have threads. If they’re waiting for something then they can switch tasks to something else. It protects a site, but it doesn’t slow them down. It doesn’t actually really waste their time because they’re performing other tasks while they wait.
If they’re going to use the energy anyway, we might as well make them get less value. Eventually the cost may be more than the benefit. If it isn’t, they spend all the energy they have access to anyway. That part isn’t going to change.
Then I apologize. All I can offer is that it’s a weakness of my client that it’s difficult and outside the inbox workflow to see any history other than the comment to which you’re replying. Not an excuse; just an explanation.
If given the option, I’d prefer all computing to have zero cost; sure. But no, I’m not complaining abou t the work. I’ll complain about inefficient work, but the real issue is work for work’s sake; in particular, systems designed specifically where the only important fact us proving that someone burned X pounds of coal to get a result. Because, while exaggerated and hyperbolically started, that’s exactly what Proof-of-Work systems are. All PoW systems care about is that the client provably consumed a certain amount of CPU power. The result is the work is irrelevant for anything but proving that someone did work.
With exceptions like BOINC, the work itself from PoW systems provides no other value.
It’s not. Computer networks can open only so many sockets at a time; threading on a single computer is finite, and programmers normally limit the amount of concurrency because high concurrency itself can cause performance issues.
They’re going to get their value anyway, right? This doesn’t stop them; it just makes each call to this more expensive. In the end, they do the work and get the data; it just cost them - and the environment - more.
Do you think this will stop scrapers? Or is it more of a “fuck you”, but with a cost to the planet?
Honey pots are a better solution; they’re far more energy efficient, and have the opportunity to poison the data. Poisoned data is more like what you suggest: they’re burning the energy anyway, but are instead getting results that harm their models. Projects like Nepenthes go in the right direction. PoW systems are harmful - straight up harmful. They’re harmful by preventing access to people who don’t use JavaScript, and they’re harmful in exactly the same way crypto mining is.
This is a stopgap while we try to find a new way to stop the DDOS happening right now. It might even be adapted to do useful work, if need be.
Hook into BOINC, or something? That’s an idea.
Sucks for people who have scripts disabled, or are using browsers without JS support, though.
It does, and I'm sure everyone will welcome a solution that lets them open things back up for those users without the abusers crippling them. It's a matter of finding one.
This isn’t hypocrisy. The git repo said this was “a bit like a nuclear response”, and like any nuclear response, I believe they expect everyone to suffer.
Not hypocrisy by the author, but by every reader who cheers this while hating on cryptocurrency.
IME most of these people can’t tell the difference between a cryptocurrency, a blockchain, and a public ledger, but have very strong opinions about anyway.
No.
It’s a rather brilliant idea really, but when you consider the environmental implications of forcing web requests to ensure proof of work to function, this effectively burns a more coal for every site that implements it.
You have a point here.
But when you consider the current worlds web traffic, this isn’t actually the case today. For example Gnome project who was forced to start using this on their gitlab, 97% of their traffic could not complete this PoW calculation.
IE - they require only a fraction of computational cost to serve their gitlab, which saves a lot of resources, coal, and most importantly, time of hundreds of real humans.
(Source for numbers)
Hopefully in the future we can move back to proper netiquette and just plain old robots.txt file!
I don’t think AI companies care, and I wholeheartedly support any and all FOSS projects using PoW when serving their websites. I’d rather have that than have them go down
Upvote for the name and tag line alone!
It is not great on many levels.
It only runs against the Firefox user agent. This is not great as the user agent can easy be changed. It may work now but tomorrow that could all change.
It doesn’t measure load so even if your website has only a few people accessing it they will stick have to do the proof of work.
The POW algorithm is not well designed and requires a lot of compute on the server which means that it could be used as a denial of service attack vector. It also uses sha256 which isn’t optimized for a proof of work type calculation and can be brute forced pretty easily with hardware.
I don’t really care for the animé cat girl thing. This is more of a personal thing but I don’t think it is appropriate.
In summary the Tor implementation is a lot better. I would love to see someone port it to the clearnet. I think this project was created by someone lacking experience which I find a bit concerning.
Doesn’t run against Firefox only, it runs against whatever you configure it to. And also, from personal experience, I can tell you that majority of the AI crawlers have keyword “Mozilla” in the user agent.
Yes, this isn’t cloudflare, but I’m pretty sure that’s on the Todo list. If not, make an issue to the project please.
The computational requirements on the server side are a less than a fraction of the cost what the bots have to spend, literally. A non-issue. This tool is to combat the denial of service that these bots cause by accessing high cost services, such as git blame on gitlab. My phone can do 100k sha256 sums per second (with single thread), you can safely assume any server to outperform this arm chip, so you’d need so much resources to cause denial of service that you might as well overload the server with traffic instead of one sha256 calculation.
And this isn’t really comparable to Tor. This is a self hostable service to sit between your web server/cdn and service that is being attacked by mass crawling.
Edit: If you don’t like the projects stickers, fork it and remove them. This is open source project.
And Xe who made this project is quite talented programmer. More than likely that you have used some of Xe’s services/sites/projects before as well.
Xe is insanely talented. If she is who I think she is, then I’ve watched her speak and her depth of knowledge across computer science topics is insane.
…you do realize that brute forcing it is the work you use to prove yourself, right? That’s the whole point of PoW
True, I should of phrased that better.
The issue is that sha256 is fairly easy to do at scale. Modern high performance hardware is well optimized for it so you could still perform attack with a bunch of GPUs. AI scrapers tend to have a lot of those.
I look forward to TOR’s PoW coming out for FOSS WAFs
I use sx.catgirl.cloud so I’m already primed to have anime catgirls protecting my webs.
Catgirls, jackalgirls, all embarrassing. Go full-on furry.
Giant middle finger from me – and probably everyone else who uses NoScript – for trying to enshittify what’s left of the good parts of the web.
Seriously, FUCK THAT.
They’re working on no-js support too, but this just had to be put out without it due to the amount of AI crawler bots causing denial of service to normal users.
You should fuck capitalism and corporations instead because they are the reason we can’t have nice things. They took the web from us
You should blame the big tech giants and their callous disregard for everyone else for the Enshittification, not the folks just trying to keep their servers up.
It’s a clever solution but I did see one recently that IMO was more elegant for noscript users. I can’t remember the name but it would create a dummy link that human users won’t touch, but webcrawlers will naturally navigate into, but then generates an infinitely deep tree of super basic HTML to force bots into endlessly trawling a cheap-to-serve portion of your webserver instead of something heavier. Might have even integrated with fail2ban to pick out obvious bots and keep them off your network for good.
If you remember the project I would be interested to see it!
But I’ve seen some AI poisoning sink holes before too, a novel concept as well. I have not heard of real world experiences of them yet.
Maybe was thinking of arstechnica.com/…/ai-haters-build-tarpits-to-trap… ?
I’m assuming they’re thinking about this
Which was posted here a while back
Maybe this is it -
lemmy.world/comment/15898939
Wouldn’t the bot simply limit the depth of it’s seek?
It could be infinitely wide too if they desired. It shouldn’t be that hard to do I wouldn’t think. I would suspect they limit the time a chain can use though to eventually escape out, though this still protects data because it obfuscates legitimate data that it wants. The goal isn’t to trap them forever. It’s to keep them from getting anything useful.
That would be reasonable. The people running these things aren’t reasonable. They ignore every established mechanism to communicate a lack of consent to their activity because they don’t respect others’ agency and want everything.
That’s a tarpit that you’re describing, like iocaine or nepthasis. Those are to feed the crawler junk data to try and make their eventual output bad.
Anubis tries to not let the AI crawlers in at all.
This is icky to me. Cool idea, but this is weird.
…Why? It’s just telling companies they can get support + white-labeling for a fee, and asking you keep their silly little character in a tongue-and-cheek manner.
Just like they say, you can modify the code and remove for free if you really want, they’re not forbidding you from doing so or anything
Yeah, it seems entirely optional. It’s not like manually removing the Anubis character will revoke your access to the code. However, I still do find it a bit weird that they’re asking for that.
I just can’t imagine most companies implementing Anubis and keeping the character or paying for the service, given that it’s open source. It’s just unprofessional for the first impression of a company’s website being the Anubis devs’ manga OC…
It is very different from the usual flat corporate style yes, but this is just their branding. Their blog is full of anime characters like that.
And it’s not like you’re looking at a literal ad for their company or with their name on it. In that sense it is subtle, though a bit unusual.
I don’t think it’s necessarily a bad thing. Subtle but unusual is a good way to describe it.
However, I would like to point out that if it is their branding, then the character appearing is an advertisement for the service. It’s just not very conventional or effective advertising, but they’re not making money from a vast majority of implementations, so it’s not very egregious anyway.
True, but I think you are discounting the risk that the actual god Anubis will take displeasure at such an act, potentially dooming one’s real life soul.
I did not find any instruction on the source page on how to actually deploy this. That would be a nice touch imho.
There are some detailed instructions on the docs site, tho I agree it’d be nice to have in the readme, too.
Sounds like the dev was not expecting this much interest for the project out of nowhere so there will def be gaps.
Or even a quick link to the relevant portion of the docs at least would be cool
I think the maze approach is better, this seems like it hurts valid users if the web more than a company would be.
For those not aware, nepenthes is an example for the above mentioned approach !
This looks like it can can actually fuck up some models, but the unnecessary CPU load it will generate means most websites won’t use it unfortunately
Nice. Crypto miners disguised as anti-AI.
what about this is crypto mining?
Found the FF14 fan lol
The release names are hilarious
What’s the ffxiv reference here?
Anubis is from Egyptian mythology.
The names of release versions are famous FFXIV Garleans
Why Sha256? Literally every processor has a crypto accelerator and will easily pass. And datacenter servers have beefy server CPUs. This is only effective against no-JS scrapers.
It requires a bunch of browser features that non-user browsers don’t have, and the proof-of-work part is like the least relevant piece in this that only gets invoked once a week or so to generate a unique cookie.
I sometimes have the feeling that as soon as some crypto-currency related features are mentioned people shut off part of their brain. Either because they hate crypto-currencies or because crypto-currency scammers have trained them to only look at some technical implementation details and fail to see the larger picture that they are being scammed.
So if you try to access a website using this technology via terminal, what happens? The connection fails?
If your browser doesn’t have a Mozilla user agent (I.e. like chrome or Firefox) it will pass directly. Most AI crawlers use these user agents to pretend to be human users
What I’m thinking about is more that in Linux, it’s common to access URLs directly from the terminal for various purposes, instead of using a browser.
If you’re talking about something like
curl
, that also uses its own User agent unless asked to impersonate some other UA. If not, then maybe I can’t help.