These are the technical details behind HumanFocusedAI.
A tool that scans the web and reports content that is dangerous for people with photosensitive epilepsy to view.
This might get a little technical. If you are more interested in the story behind this project, you can read about it here.
Ok.. first let's lay the foundation: "What is epilepsy anyway?"
Epilepsy is the fourth most common neurological disorder and is characterized by recurrent epileptic seizures.
It affects people of all ages and affects around 1% of the population.
For about 5% of people with epilepsy (millions of people worldwide) exposure to flashing lights at certain intensities or to certain visual patterns can trigger seizures.
This condition is known as photosensitive epilepsy.
The internet can be a dangerous place for people with photosensitive epilepsy.
There have been cases where flashing GIFs on computer or mobile phone screens have triggered epileptic seizures.
And even more extreme cases where flashing content has sent hundreds of people with photosensitive epilepsy to the hospital.
In general, there are three things that can make media content dangerous
1) Large enough flashes with great enough contrast at specific frequencies
2) Frame transitions including saturated reds
3) geometrical patterns, such as stripes
This specialized algorithm couldn't have been built without standing on the shoulders of giants.
The algorithm is based on the work of leading scientists in photosensitive epilepsy, like the late Dr. Harding and Professor Binnie from King's College London.
It also follows the well-established guidelines of Ofcom and WCAG20.
Basically, each source has it's own threshold, with the most common one being that an animated piece of content cannot flash more than three times per second.
A content is considered to be flashing when more than 25% of the screen is flashing.
And a general flash is defined as a pair of opposing changes in relative luminance of 10% or more, followed by the reverse.
Although it sounds complicated, it's not really. The problem lies elsewhere.
You just go through all the pixels one by one, checking if they blink more than 3 times in a given second. If yes, mark it.
If many neighboring pixels blink above that threshold at the same time, then flag the file as dangerous.
However, there is one small element we didn't take into consideration... Time!
At this point I was ready to give up.
Analyzing videos and GIFs wasn't as easy as it sounds, after all.
An example. Let's say you want to analyze a simple GIF, 600x600 pixels with 200 frames. Then, you would have 600x600 pixels x 200 frames. That's 72 million pixels to analyze!
Write a for loop going from 0 to 72 million and just print hi. See how long it takes.
Actually, don't. I did it for you. This example is in Ruby.
That's 432 seconds. Or else 7.2 minutes. Without even doing any calculations or reading/writing.
Yes, you read that right. I managed to go from 10 minutes to 1 second.
Using something boring but powerful called Linear Algebra.
In order to use Linear Algebra in programming, you have to start thinking differently about data.
No. Sadly I'm not talking about the awesome 1999 movie. I'm talking about the data structure.
Say you have a bunch of data. Pixels. Or a bunch of records with ages.
One way to think about that data is as arrays. That's the standard and most intuitive ways to think about data.
"I have an array with 10 million integers."
ages = [23, 12, 6, ...]
Let's say that every minute you want to update these ages.
As we showed earlier, looping through large arrays isn't efficient. The time needed would too much to keep them updated at minute precision.
But what if we think about the data as a matrix?
(Not the actual values for the ages but you get the point. This is just a random image I found on the interwebs)
With linear algebra, you treat the whole structure of data as one.
Performing an addition to every cell of a matrix is a single operation between two matrixes. That's it.
All you have to do is think about it deeply and come up with the other array that we need in order to get our desired result.
In our example, it would be to adding it with a matrix of 1s.
(Again, not the actual values for our matrixes but you get the point. This is just a random image I found on the interwebs)
And in Python and NumPy, it's even simpler:
ages = [12, 19, 23, 81, 27, ...]
matrix_ages = numpy.array(a)
print matrix_ages + 1
(Ok, this time it's the actual values)
The computer, instead of going through every single of the 10 million elements and performing a (+1) operation, it just takes the two matrixes and collides them.
After that insane car crash of an operation, you have your final result straight away.
Less flexible? Sure. Faster? Fuck yes.
Every technology has it's place and time. And when high performance is important, linear algebra rules.
I wish they taught us real life use cases of technologies in school.
Of course it had to be Python.
Anything to do with Big Data and AI usually is.
And even though "Python is slow", by using optimized Python packages written in C you can get over this problem.
These packages are basically C code that Python loads as a binary file and uses it. You are extending the Python language with C and get the best of both worlds.
The high level of ease of use of Python with the high performance of C. It's pretty much genius.
Packages like NumPy, Pandas, etc.
It is hosted on Heroku because I hate managing servers and the API is a simple Flask app.
After 3 months of all day and all night coding, the algorithm was ready.
It was a chrome extension that worked like an AdBlocker.
You would install it locally and it would scan and block dangerous content before you even see it.
However, it ultimately failed for several reasons:
1. People had to find out about the product
2. People had to understand what it does
3. People had to understand what a browser extension was
4. Because every client would ping the API multiple times per second, the servers would
constantly crash and I had to add a $5/mo price tag to stop people with epilepsy trying out the product for fun
5. People had to install it on their browser
6. It couldn't work on mobile apps
All of the above created too much friction for it to work.
Plus, I am not funded or part of any organization, non profit or corporation. It's just me by myself.
So, let me introduce v2 now and why I think it's superior!
This little bad boy:
1. Scans the web by itself
2. Scans the few most popular websites (Instagram, TikTok, Twitter, Facebook, etc) that amount to almost all of the internet's web traffic
3. Instantly reports dangerous content
4. No one has to install anything
5. No one has to understand how it work
6. No one has to even know about it
7. It just works. And provides value for millions of people
As of now, our bot works perfectly for Twitter.
Here are some things I want to add and improve in the near future:
Support all the major platforms, like Instagram, TikTok, Facebook, etc.
Start scanning films and shows on Netflix and create a database with safe and dangerous movies.
Improve and bring back the browser extension (v1) as an extra layer of security.
And much more.
I would also like to build more tools for accessibility.
Tools that help people navigate the web. Equal access to the internet should be a right, not a privilege.
Tools for people that suffer from blindness, deafness, Parkinson's, etc are some things I have in mind.
All this technology will eventually become open source.
If anyone wants to contribute and build these things with me, just shoot me a message!