Why are anime catgirls blocking my access to the Linux kernel?
(lock.cmpxchg8b.com)
from tofu@lemmy.nocturnal.garden to selfhosted@lemmy.world on 21 Aug 09:02
https://lemmy.nocturnal.garden/post/194665
from tofu@lemmy.nocturnal.garden to selfhosted@lemmy.world on 21 Aug 09:02
https://lemmy.nocturnal.garden/post/194665
Some thoughts on how useful Anubis really is. Combined with comments I read elsewhere about scrapers starting to solve the challenges, I’m afraid Anubis will be outdated soon and we need something else.
threaded - newest
Yeah has seemed like a bit of a waste of time, once that difficulty gets scaled up and expiration down it’s gonna get annoying to use the web on phones
I had to get my glasses to re-read this comment.
You know why anubis is in place on so many sites, right? You are literally blaming the victims for the absolute bullshit AI is foisting on us all.
I don’t think so. I think he’s blaming the “solution” as being a stop gap at best and painful for end-users at worst. Yes the AI crawlers have caused the issue but I’m not sure this is a great final solution.
As the article discussed, this is essentially “an expensive“ math problem meant to deter AI crawlers but in the end it ain’t really that expensive. It’s more like they put two door handles on a door hoping the bots are too lazy to turn both of them but also severely slowing down all one-handed people. I’m not sure it will ever be feasible to essentially figure out how to have one bot determine if the other end is also a bot without human interaction.
It works because it’s a bit of obscurity, not because it’s expensive. Once it’s a big enough problem to the scrapers, the scrapers will adapt and then the only option is to make it more obscure/different or crank up the difficulty which will slow down genuine users much more
Yes, I manage cloudflare for a massive site that at times gets hit with millions of unique bot visits per hour
So you know that this is the lesser of the two evils? Seems like you’re viewing it from client’s perspective only.
No one wants to burden clients with Anubis, and Anubis shouldn’t exist. We are all (server operators and users) stuck with this solution for now because there is nothing else at the moment that keeps these scrapers at bay.
Even the author of Anubis doesn’t like the way it works. We all know it’s just more wasted computing for no reason except big tech doesn’t give a care about anyone.
My point is, and the author’s point is, it’s not computation that’s keeping the bots away right now. It’s the obscurity and challenge itself getting in the way.
The current version of Anubis was made as a quick “good enough” solution to an emergency. The article is very enthusiastic about explaining why it shouldn’t work, but completely glosses over the fact that it has worked, at least to an extent where deploying it and maybe inconveniencing some users is preferable to having the entire web server choked out by a flood of indiscriminate scraper requests.
The purpose is to reduce the flood to a manageable level, not to block every single scraper request.
And it was/is for sure the lesser evil compared to what most others did: put the site behind Cloudflare.
I feel people that complain about Anubis have never had their server overheat and shut down on an almost daily basis because of AI scrapers 🤦
Yeah, I’m just wondering what’s going to follow. I just hope everything isn’t going to need to go behind an authwall.
The developer is working on upgrades and better tools. xeiaso.net/…/avoiding-becoming-peg-dependency/
Cool, thanks for posting! Also the reasoning for the image is cool.
I’ll say the developer is also very responsive. They’re (ambiguous ‘they’, not sure of pronouns) active in a libraries-fighting-bots slack channel I’m on. Libraries have been hit hard by the bots: we have hoards of tasty archives and we don’t have money to throw resources at the problem.
The Anubis repo has an enbyware emblem fun fact :D
Yay! I won’t edit my comment (so your comment will make sense) but I checked and they also list they/them on their github profile
Unless you have a dirty heatsink, no amount of hammering would make the server overheat
Are you explaining my own server to me? 🙄
What CPU do you have made after 2004 that doesn’t have automatic temperature control ?
I don’t think there is any, unless you somehow managed to disable it ?
Even a raspberry pi without a heatsink won’t overheat to shutdown
You are right, it is actually worse, it usually just overloads the CPU so badly that it starts to throttle and then I can’t even access the server via SSH anymore. But sometimes it also crashes the server so that it reboots, and yes that can happen on modern CPUs as well.
You need to set you http serving process to a priority below the administrative processes (in the place where you are starting it, so assuming linux server that would be your init script or systemd service unit).
Actual crash causing reboot ? Do you have faulty ram maybe ? That’s really not ever supposed to happen from anything happenning in userland. That’s not AI, your stuff might be straight up broken.
Only thing that isn’t broken that could reboot a server is a watchdog timer.
You server shouldn’t crash, reboot or become unreachable from the admin interface even at 100% load and it shouldn’t overheat either, temperatures should never exceed 80C no matter what you do, it’s supposed to be impossible with thermal management, which all processors have had for decades.
Great that this is all theoretical 🤷 My server hardware might not be the newest but it is definitly not broken.
And besides, what good is that you can still barely access the server through ssh, when the cpu is constantly maxed out and site visitors only get a timeout when trying to access the services?
I don’t even get what you are trying to argue here. That the AI scraper DDOS isn’t so bad because in theory it shouldn’t crash the server? Are you even reading what you are writing yourself? 🤡
Even if your server is a cell phone from 2015, if it’s operating correctly and the cpu is maxed out, that means it’s fully utilized and services hundreds of megabits of information.
You’ve decided to let the entire world read from your server, that indiscriminatory policy is letting people you don’t want getting your data, get your data and use your resources.
You want to correct that by making everyone that comes in solve a puzzle, therefore in some way degrading their access, it’s not surprising that they’re going to complain. The other day I had to wait over 30 second at an anubis puzzle page, when I know that the AI scrappers have no problem getting through, something on my computer, probably some anti-crypto mining protection is getting triggered by it and now I can’t no-script the web either because of that thing and it can’t even stop scrappers anyway !
So, anubis is going to be left behind, all the real users are, for years, going to be annoyed and have their entire internet degraded by it while the scrappers got that institutionally figured out in days.
If it’s freely available public data then the solution isn’t restricting access trying to play a futile arms race with the scrapper and throwing the real users to the dogs, it’s to have standardized incremental efficient database dumps so the scrappers stop assuming every website is interoperability-hostile and scrape them. Let facebook and xitter fight the scrappers, let anyone trying to leverage public (and especially user contributed data) fight the scrappers.
Aha, an apologist for AI scraper DDOS, why didn’t you say so directly instead of wasting my time?
The ddos is caused by the gatekeeping, there was no such issue before the 2023 API wars, fork over the goods and nobody gets hurt, it’s not complicated, you want to publish information to the public, don’t scrunch it up behind diseased trackers and ad infested pages which burn you cpu cycles. Or just put it in a big tarball torrent, the web is turning into a cesspool, how long until our browsers don’t even query websites at all but self-hosted crawler and search like searxng, at least then I won’t be catching cooties from your javascript cryptomining bots embed into the pages !
“fork over the goods and nobody gets hurt” mate you are not sounding like the good person here
Even if one would want to give them everything, they don’t care. They just burn through their resources and recursively scrape every single link on your page. Providing standardized database dumps is absolutely not helping against your server being overloaded by scrapers of various companies with deep pockets.
Like anubis, that’s not going to last, the point isn’t to hammer the web servers off the net, it’s to get the precious data. The more standardized and streamlined that’s going to be made and only if there’s no preferential treatment to certain players (open ai / google facebook) then the dumb scraper will burn themselves out.
One nice thing about anubis and nepenthes is that it’s going to burn out those dumb scrapers faster and force them to become more sophisticated and stealth. That’s should resolve the ddos problem on its own.
For the truly public data sources, I think coordinated database dumps is the way to go, for hostile carrier, like reddit and facebook, it’s going to be scrapper arms race warfare like Cory Doctorow predicted.
Why the hell don’t you limit the CPU usage of that service?
For any service that could hog resources so bad that they can block the entire system the normal thing to do is to limit their max resource usage. This is trivial to do using containers. I do it constantly for leaky software.
Obviously I did that, but that just means the site becomes inaccessible even sooner.
Is there a reason other than avoiding infrastructure centralization not to put a web server behind cloudflare?
Cloudflare would need https keys so they could read all the content you worked so hard to encrypt. If I wanted to do bad shit I would apply at Cloudflare.
Maybe I’m misunderstanding what “behind cloudflare” means in this context, but I have a couple of my sites proxied through cloudflare, and they definitely don’t have my keys.
I wouldn’t think using a cloudflare captcha would require such a thing either.
Hmm, I should look up how that works.
Edit: developers.cloudflare.com/ssl/…/ssl-modes/#custom…
They don’t need your keys because they have their own CA. No way I’d use them.
Edit 2: And with their own DNS they could easily route any address through their own servers if they wanted to, without anyone noticing. They are entirely too powerful. Is there some way to prevent this?
That’s because they just terminate TLS at their end. Your DNS record is “poisoned” by the orange cloud and their infrastructure answers for you. They happen to have a trusted root CA so they just present one of their own certificates with a SAN that matches your domain and your browser trusts it. Bingo, TLS termination at CF servers. They have it in cleartext then and just re-encrypt it with your origin server if you enforce TLS, but at that point it’s meaningless.
Oh, I didn’t think about the fact that they’re a CA. That’s a good point; thanks for the info.
Yes, because Cloudflare routinely blocks entire IP ranges and puts people into endless captcha loops. And it snoops on all traffic and collects a lot of metadata about all your site visitors. And if you let them terminate TLS they will even analyse the passwords that people use to log into the services you run. It’s basically a huge survelliance dragnet and probably a front for the NSA.
I still think captchas are a better solution.
In order to surpass them they have to run AI inference which is also comes with compute costs. But for legitimate users you don’t run unauthorized intensive tasks on their hardware.
They are much worse for accessibility, and also take longer to solve and are more distruptive for the majority of users.
Anubis is worse for privacy. As you have to have JavaScript enabled. And worse for the environment as the cryptographic challenges with PoW are just a waste.
Also reCaptcha types are not really that disturbing most of the time.
As I said, the polite thing you just be giving users the options. Anubis PoW running directly just for entering a website is one of the most rudest piece of software I’ve seen lately. They should be more polite, and just give an option to the user, maybe the user could chose to solve a captcha or run Anubis PoW, or even just having Anubis but after a button the user could click.
I don’t think is good practice to run that type of software just for entering a website. If that tendency were to grow browsers would need to adapt and straight up block that behavior. Like only allow access to some client resources after an user action.
Are you seriously complaining about an (entirely false) negative privacy aspect of Anubis and then suggest reCaptcha from Google is better?
Look, no one thinks Anubis is great, but often it is that or the website becoming entirely inaccessible because it is DDOSed to death by the AI scrapers.
First, I said reCaptcha types, meaning captchas of the style of reCaptcha. That could be implemented outside a google environment. Secondly, I never said that types were better for privacy. I just said Anubis is bad for privacy. Traditional captchas that work without JavaScript would be the privacy friendly way.
Third, it’s not a false proposition. Disabling JavaScript can protect your privacy a great deal. A lot of tracking is done through JavaScript.
Last, that’s just the Anubis PR slogan. Not the truth, as I said ddos mitigation could be implemented in other ways. More polite and/or environmental friendly.
Are you astrosurfing for anubis? Because I really cannot understand why something as simple as a landing page with a button “run PoW challenge” would be that bad
Anubis is not bad for privacy, but rather the opposite. Server admins explicitly chose it over commonly available alternatives to preserve the privacy of their visitors.
If you don’t like random Javascript execution, just install an allow-list extension in your browser 🤷
And no, it is not a PR slogan, it is the live experience of thousands of server admins (me included) that have been fighting with this for month now and are very grateful that Anubis has provided some (likely only temporary) relief from that.
And I don’t get what the point of an extra button would be when the result is exactly the same 🤷
Latest version of Anubis has a JavaScript-free verification system. It isn’t as accurate, so I allow js-free visits only if the site isn’t being hammered. Which, tbf, prior to Anubis no one was getting in, JS or no JS.
Out of curiosity, what’s the issue with Cloudflair? Aside from the constant worry they may strong arm you into their enterprise pricing if you’re site is too popular lol. I understand support open source, but why not let companies handle the expensive bits as long as they’re willing?
I guess I can answer my own question. If the point of the Fediverse is to remove a single point of failure, then I suppose Cloidflare could become a single point to take down the network. Still, we could always pivot away from those types of services later, right?
Cloudflare has IP banned me before for no reason (no proxy, no VPN, residential ISP with no bot traffic). They’ve switched their captcha system a few times, and some years it’s easy, some years it’s impossible.
The problem is that the purpose of Anubis was to make crawling more computationally expensive and that crawlers are apparently increasingly prepared to accept that additional cost. One option would be to pile some required cycles on top of what’s currently asked, but it’s a balancing act before it starts to really be an annoyance for the meat popsicle users.
That’s why the developer is working on a better detection mechanism. xeiaso.net/…/avoiding-becoming-peg-dependency/
This post was originally written for ycombinator “Hacker” News which is vehemently against people hacking things together for greater good, and more importantly for free.
It’s more of a corporate PR release site and if you aren’t known by the “community”, calling out solutions they can’t profit off of brings all the tech-bros to the yard for engagement.
Exactly my thoughts too. Lots of theory about why it won’t work, but not looking at the fact that if people use it, maybe it does work, and when it won’t work, they will stop using it.
Yeah, well-written stuff. I think Anubis will come and go. This beautifully demonstrates and, best of all, quantifies the
negligencenegligible cost to scrapers of Anubis.It’s very interesting to try to think of what would work, even conceptually. Some sort of purely client-side captcha type of thing perhaps. I keep thinking about it in half-assed ways for minutes at a time.
Maybe something that scrambles the characters of the site according to some random “offset” of some sort, e.g maybe randomly selecting a modulus size and an offset to cycle them, or even just a good ol’ cipher. And the “captcha” consists of a slider that adjusts the offset. You as the viewer know it’s solved when the text becomes something sensical - so there’s no need for the client code to store a readable key that could be used to auto-undo the scrambling. You could maybe even have some values of the slider randomly chosen to produce English text if the scrapers got smart enough to check for legibility (not sure how to hide which slider positions would be these red herring ones though) - which could maybe be enough to trick the scraper into picking up junk text sometimes.
That type of captcha already exists. I don’t know about their specific implementation, but 4chan has it, and it is trivially bypassed by userscripts.
@mfed1122 @tofu any client-side tech to avoid (some of the) bots is bound to, as its popularity grows, be either circumvented by the bot’s developers or the model behind the bot will have picked up enough to solve it
I don’t see how any of these are going to do better than a short term patch
That’s the great thing about Anubis: it’s not client-side. Not entirely anyways. Similar to public key encryption schemes, it exploits the computational complexity of certain functions to solve the challenge. It can’t just say “solved, let me through” because the client has to calculate a number, based on the parameters of the challenge, that fits certain mathematical criteria, and then present it to the server. That’s the “proof of work” component.
A challenge could be something like “find the two prime factors of the semiprime
1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139
”. This number is known as RSA-100, it was first factorized in 1991, which took several days of CPU time, but checking the result is trivial since it’s just integer multiplication. A similar semiprime of 260 decimal digits still hasn’t been factorized to this day. You can’t get around mathematics, no matter how advanced your AI model is.@rtxn I don’t understand how that isn’t client side?
Anything that is client side can be, if not spoofed, then at least delegated to a sub process, and my argument stands
It’s not client-side because validation happens on the server side. The content won’t be displayed until and unless the server receives a valid response, and the challenge is formulated in such a way that calculating a valid answer will always take a long time. It can’t be spoofed because the server will know that the answer is bullshit. In my example, the server will know that the prime factors returned by the client are wrong because their product won’t be equal to the original semiprime. Delegating to a sub-process won’t work either, because what’s the parent process supposed to do? Move on to another piece of content that is also protected by Anubis?
The point is to waste the client’s time and thus reduce the number of requests the server has to handle, not to prevent scraping altogether.
@rtxn validation of what?
This is a typical network thing: client asks for resource, server says here’s a challenge, client responds or doesn’t, has the correct response or not, but has the challenge regardless
THEN (and this is the part you don’t seem to understand) the client process has to waste time solving the challenge, which is, by the way, orders of magnitudes lighter on the server than serving the actual meaningful content, or cancel the request. If a new request is sent during that time, it will still have to waste time solving the challenge. The scraper will get through eventually, but the challenge delays the response and reduces the load on the server because while the scrapers are busy computing, it doesn’t have to serve meaningful content to them.
@rtxn all right, that’s all you had to say initially, rather than try convincing me that the network client was out of the loop: it isn’t, that’s the whole point of Anubis
With how much authority you wrote with before, I thought you’d be able to grasp the concept. I’m sorry I assumed better.
Please, explain to us how you expect to spoof a math problem that you have to provide an answer to the server before proceeding.
You can math all you want on the client, but the server isn’t going to give you shit until you provide the right answer.
@Passerby6497 I really don’t understand the issue here
If there is a challenge to solve, then the server has provided that to the client
There is no way around this, is there?
You’re given the challenge to solve by the server, yes. But just because the challenge is provided to you, that doesn’t mean you can fake your way through it.
You still have to calculate the answer before you can get any farther. You can’t bullshit/spoof your way through the math problem to bypass it, because your correct answer is required to proceed.
Unless the server gives you a well-known problem you have the answer to/is easily calculated, or you find a vulnerability in something like Anubis to make it accept a wrong answer, not really. You’re stuck at the interstitial page with a math prompt until you solve it.
Unless I’m misunderstanding your position, I’m not sure what the disconnect is. The original question was about spoofing the challenge client side, but you can’t really spoof the answer to a complicated math problem unless there’s an issue with the server side validation.
@Passerby6497 my stance is that the LLM might recognize that the best way to solve the problem is to run chromium and get the answer from there, then pass it on?
Anubis has worked if that's happening. The point is to make it computationally expensive to access a webpage, because that's a natural rate limiter. It kinda sounds like it needs to be made more computationally expensive, however.
LLMs can’t just run chromium unless they’re tool aware and have an agent running alongside them to facilitate tool use. I highly suspect that AI web crawlers aren’t that sophisticated.
Congrats on doing it the way the website owner wants! You’re now into the content, and you had to waste seconds of processing power to do so (effectively being throttled by the owner), so everyone is happy. You can’t overload the site, but you can still get there after a short wait.
@Passerby6497 yes I’ve been told as much 😅
https://lemmy.world/comment/18919678
Jokes aside, I understand this was the point. I just wanted to make the point that it is feasible, if not currently economically viable
That solution still introduces lots of friction. At the volume and rate that these bots want to be traversing the internet, they probably don’t want to be fully graphically rendering pages and spawning extra browser processes then doing text recognition to then pass on to the LLM training sets. Maybe I’m wrong there, but I don’t think it’s that simple and actually just shifts solving the math challenge horizontally (i.e., in both cases, the scraper or the network the scraper is running on still has to solve the challenge)
Yeah, you’re absolutely right and I agree. So then do we have to resign the situation to being an eternal back-and-forth of just developing random new challenges every time the scrapers adapt to them? Like antibiotics for viruses? Maybe that is the way it is. And honestly that’s what I suspect. But Anubis feels so clever and so close to something that would work. The concept of making it about a cost that adds up, so that it intrinsically only effects massive processes significantly, is really smart…since it’s not about coming up with a challenge a computer can’t complete, but just a challenge that makes it economically not worth it to complete. But it’s disappointing to see that, at least with the current wait times, it doesn’t seem like it will cost enough to dissuade scrapers. And worse, the cost is so low that it seems like making the cost significant to the scrapers will require really insufferable wait times for users.
That kind of captcha is trivial to bypass via frequency analysis. Text that looks like language, as opposed to random noise, is very statistically recognisable.
Not to mention it relies on security though obscurity
It wouldn’t be that hard to figure out and bypass
I’m sure you meant to sound more analytical than anything… but this really comes off as arrogant.
You make the claim that Anubis is negligent and come and go, and then admit ton only spending minutes at a time thinking of solutions yourself, which you then just sorta spout. It’s fun to think about solutions to this problem collectively, but can you honestly believe that Anubis is negligent when it’s so clearly working and when the author has been so extremely clear about their own perception of its pitfalls and hasty development (go read their blog, it’s a fun time).
By negligence, I meant that the cost is negligible to the companies running scrapers, not that the solution itself is negligent. I should have said “negligibility” of Anubis, sorry - that was poor clarity on my part.
But I do think that the cost of it is indeed negligible, as the article shows. It doesn’t really matter if the author is biased or not, their analysis of the costs seems reasonable. I would need a counter-argument against that to think they were wrong. Just because they’re biased isn’t enough to discount the quantification they attempted to bring to the debate.
Also, I don’t think there’s any hypocrisy in me saying I’ve only thought about other solutions here and there - I’m not maintaining an anti-scraping library. And there’s already been indications that scrapers are just accepting the cost of Anubis on Codeberg, right? So I’m not trying to say I’m some sort of tech genius who has the right idea here, but from what Codeberg was saying, and from the numbers in this article, it sure looks like Anubis isn’t the right idea. I am indeed only having fun with my suggestions, not making whole libraries out of them and pronouncing them to be solutions. I personally haven’t seen evidence that Anubis is so clearly working? As the author points out, it seems like it’s only working right now because of how new it is, but if scrapers want to go through it, they easily can - which puts us in a sort of virus/antibiotic eternal war of attrition. And if course that is the case with many things in computing as well. So I guess my open wondering are just about if there’s ever any way to develop a countermeasure that the scrapers won’t find “worth it” to force through?
Edit for tone clarity: I’m don’t want to be antagonistic, rude, or hurtful in any way. Just trying to have a discussion and understand this situation. Perhaps I was arrogant, if so I apologize. It was also not my intent, fwiw. Also, thanks for helping me understand why I was getting downvoted. I intended my post to just be constructive spitballing about what I see as the eventual inevitable weakness in Anubis. I think it’s a great project and it’s great that people are getting use out of it even temporarily, and of course the devs deserve lots of respect for making the thing. But as much as I wish I could like it and believe it will solve the problem, I still don’t think it will.
Well I can agree on the fact that the arms race situation we’re in sucks. It’s an old problem, seen in malware attacks and defenses. I’m just glad we have people fighting on our side in their spare time :’)
And it’s all good on the tone, thank you for your clarifications
Anubis is more of a economic solution. It doesn’t stop bots but it does make companies pay more to access content instead of having server operators foot the bill.
Well it doesnt fucking matter what “makes sense to you” because it is working…
Its being deployed by people who had their sites DDoS’d to shit by crawlers and they are very happy with the results so what even is the point of trying to argue here?
It’s working because it’s not very used. It’s sort of a “pirate seagull” theory. As long a few people use it it works. Because scrappers don’t really count on Anubis so they don’t implement systems to surpass it.
If it were to become more common it would be really easy to implement systems that would defeat the purpose.
As of right now sites are ok because scrappers just send https requests and expect a full response. If someone wants to bypass Anubis protection they would need to take into account that they will receive a cryptographic challenge and have to solve it.
The thing is that cryptographic challenges can be very optimized. They are designed to run in a very inefficient environment as it is a browser. But if someone would take the challenge and solve it in a better environment using CUDA or something like that it would take a fraction of the energy defeating the purpose of “being so costly that it’s not worth scrapping”.
At this point it’s only a matter of time that we start seeing scrappers like that. Specially if more and more sites start using Anubis.
Have you tried accessing it by using Nyarch?
I’m constantly unable to access Anubis sites on my primary mobile browser and have to switch over to Fennec.
I love that domain name.
There are some sites where Anubis won’t let me through. Like, I just get immediately bounced.
So RIP dwarf fortress forums. I liked you.
I don’t get it, I thought it allows all browser with JavaScript enabled.
I, too get blocked by certain sites. I think it’s a configuration thing, where it does not like my combination of uBlock/NoScript, even when I explicitly allow their scripts…
New developments: just a few hours before I post this comment, The Register posted an article about AI crawler traffic. www.theregister.com/2025/…/ai_crawler_traffic/
Anubis’ developer was interviewed and they posted the responses on their website: xeiaso.net/notes/2025/el-reg-responses/
In particular:
So, yeah. If we believe Xe, OOP’s article is complete hogwash.
Cool article, thanks for linking! Not sure about that being a new development though, it’s just results, but we already knew it’s working. The question is, what’s going to work once the scrapers adapt?
Anubis sucks
However, the number of viable options is limited.
Yeah but at least Anubis is cute.
I’ll take sucks but cute over dead internet and endless swarmings of zergling crawlers.
What sucks about Anubis?
The implementation
It runs JavaScript and the actual algorithm could use improvement.
Sometimes I think. Imagine if a company like google or facebook would implement something like anubis. And suddenly most people’s browsers would start solving cpu intensive constant cryptographic challenges. People would be outraged by the wasted energy. But somehow “cool small company” does it and it’s fine.
I do not think anubis system is sustainable for all the people to use it, it’s just too wasteful energy wise.
What alternatives do you propose?
Captcha.
It does all Anubis does. If a scrapper wants to solve it automatically it’s computer intensive, they have to run AI inference, but for the user it’s just a little time consuming.
With captchas you don’t run aggressive software unauthorized on anyone’s computer.
Solution did exist. But Anubis is “trendy” and they are masters in PR within some specific circles of people who always wants the lastest most trendiest thing.
But good old captcha would achieve the same result as Anubis, in a more sustainable way.
Or at least give user an option of running or not running the challenge and leave the page. And make clear for the user that their hardware is going to run an intensive task. It really feels very aggressive to have a webpage to run basically a cryptominer unauthorized in your computer. And for me having a cargirl as a mascot does not forgive the rudeness of it.
“good old captcha” is the most annoying thing ever for people and basically universally hated. Talking about leaving the page, what do you think what will cause more people to leave the page, a captcha that’s often broken or something where people don’t have to do anything but wait a little?
Also universally useless. Image recognition solved Captcha ages ago and the new version from Google is literal spyware.
Chuppl has a great video essay on it. youtu.be/VTsBP21-XpI
They don’t have to do anything but let an unknown program to max their cpu unauthorized.
Imagine if google would implement that. Billions of computers running PoW constantly, what could go wrong?
But they currently can’t and that’s the point.
but captcha is trash whose only purpose is to train ai for google
What?
You don’t need to use google, or cloudfare, captcha to have a captcha.
There are open source implementations of reCaptcha. And you can always run a classical captcha based on image recognition.
google is like 95% of the captchas on the internet.
So? You have free will to use another captcha.
Anubis is no challenge like a captcha. Anubis is a ressource waster, forcing crawler to resolve a crypto challenge (basically like mining bitcoin) before being allowed in. That how it defends so well against bots, as they do not want to waste their resources on needless computing, they just cancel the page loading before it even happen, and go crawl elsewhere.
No, it works because the scraper bots don’t have it implemented yet. Of course the companies would rather not spend additional compute resources, but their pockets are deep and some already adapted and solve the challenges.
To solve it or not do not change that they have to use more resources for crawling, which is the objective here. And by contrast, the website sees a lot less load compared to before the use of Anubis. In any case, I see it as a win.
But despite that, it has its detractors, like any solution that becomes popular.
But let’s be honest, what are the arguments against it?
It takes a bit longer to access for the first time? Sure, but that’s not like you have to click anything or write anything.
It executes foreign code on your machine? Literally 90% of the web does these days. Just disable JavaScript to see how many website is still functional. I’d be surprised if even a handful does.
The only people having any advantages at not having Anubis are web crawler, be it ai bots, indexing bots, or script kiddies trying to find a vulnerable target.
Sure, I’m not arguing against Anubis! I just don’t think the added compute cost is sufficient to keep them out once they adjust.
Conceptually, you could just really twist the knobs up. A human can wait to read a page for 15 seconds. But you’re trying to scrape 100,000 pages and they each take 15 seconds… You can make it expensive in both power and time that’s a win.
I’m against it for several reasons. Running unauthorized heavy duty code on your end. It’s not JS in order to make your site functional, it’s heavy calculations unprompted. If they would add simple button “click to run challenge” would at least be more polite and less “malware-like”.
For some old devices the challenge last over 30 seconds, I can type a captcha in less time than that.
It blocks behind the necessity to use a browser several webs that people (like the article author) tend to browse directly from a terminal.
It’s a delusion. As shown by the article author solving the PoW challenge is not that much of an added cost. Span reduction would be the same with any other novel method, crawlers are just not prepared for it. Any prepared crawler would have no issues whatsoever. People are seeing results just because it’s obscurity, not because it really works as advertised. And in fact I believe some sites are starting to get crawled aggressively despite anubis as some crawlers are already catching up with this new Anubis trend.
Take into account that the challenge needs to be light enough so a good user can enter the website in a few seconds running the challenge on a browser engine (very inefficient). A crawler interested in your site could easily put up a solution to mine the PoW using CUDA in a GPU which would be hundreds if not thousands of times more efficient. So the balance of difficulty (still browsable for users but costly to crawl) is not feasible.
It’s not universally applicable. Imagine if all internet were behind PoW challenges. It would be like constant Bitcoin mining, a total waste of resources.
The company behind Anubis seems more shady to me each day. They feed on anti-AI paranoia, they didn’t even answer the article author valid critics when he email them, they use clearly PR language aimed to convince and please certain demographics to place their product. They are full of slogans but lack substance. I just don’t trust them.
Fair point. I do agree with the “clic to execute challenge” approach.
For the terminal browser, it has more to do with it not respecting web standard than Anubis not working on it.
As for old hardware, I do agree that a temporization could be good idea, if it wasn’t so easy to circumvent. In such case bots would just wait in the background and resume once the timer is fullified, which would vastly decrease Anubis effectiveness as they don’t uses much power to do so. There isn’t really much that can be done here.
As for the CUDA solution, that will depend on the implemented hash algorithm. Some of them (like the one used by Monero) are made to vastly more inefficient on GPU than it is on the CPU. Moreover, GPU servers are far more expensive to run than CPU ones, so the result would be the same : crawling would be more expensive.
In any case, the best solution would be by far to make it a legal requirement to respect robot.txt, but for now the legislators prefer to look the other way.
I use uMatrix, which blocks js by default, so it is a bit inconvenient to have to enable js for some sites. websites which didn’t need it before, which is often the reason I use them, now require javascript.
The point was never that Anubis challenges are something scrapers can’t get past. The point is it’s expensive to do so.
Some bots don’t use JavaScript and can’t solve the challenges and so they’d be blocked, but there was never any point in time where no scrapes could solve them.
Wait, so browsers that disable JavaScript won’t be able to access those websites? Then I hate it.
Not everyone wants unauthenticated RCE from thousands of servers around the world.
Ive got really bad news for you my friend
because anime catgirls are the best
Did the author only now discover cryptography? It's like a cryptocurrency, just without currency, what a concept!
It’s quite similar.
It’s a perfectly valid way to explain it, though
If you try to show up with “cryptography” as an explanation, people will think of encrypting messages, not proof of work
“Cryptocurrency with the currency” really is the perfect single sentence explanation