Jorge Guerra Pires
- Dec 16, 2022
- 1 min read

Meet the The Toxicity Classifier

Google provides a few “ready-to-go” models of varying complexity. One beneficial

model is called the Toxicity model, which is perhaps one of the most straightforward

and useful models for beginners.

Like all programming, a model will require specific input and will provide specific

output. To kick things off, let’s take a look at what those are for in this model. Toxicity

detects toxic content such as threats, insults, cussing, and generalized hate. Since

those aren’t necessarily mutually exclusive, it’s important that each of these violations

has their own probability.

The Toxicity model attempts to identify a probability that a given input is true or false

for the following characteristics:

• Identity attack

• Insult

• Obscene

• Severe toxicity

• Sexually explicit

• Threat

• Toxicity

When you give the model a string, it returns an array of seven objects to identify the

percentage-of-probability prediction for each specific violation. Percentages are represented

as two Float32 values between zero and one.

If a sentence is surely not a violation, the probabilities will give most of the value to

the zero index in the Float32 array.

For example, [0.7630404233932495, 0.2369595468044281] reads that the prediction

for this particular violation is 76% not a violation and 24% likely a violation. We are going to use this information for our upcoming addition the the course, creating an insult meter

Tell me more!

Reference

Learning TensorFlow.js. Book by Gant Laborde

Meet the The Toxicity Classifier

Tell me more!

Reference

Recent Posts

Subscribe Form