Domain categorization is assigning one or more categories to a given domain, depending on its content.
As an example, let us consider the website of www.apple.com and use our tool www.websitecategorizationapi.com to categorize it. You can check it out for free here: https://www.websitecategorizationapi.com
Using IAB taxonomy, www.apple.com is categorized as “Technology & Computing” by Tier 1 classifier, which uses only the major categories.
If we classify the domain with Tier 2 classifier, which uses 400+ categories and is thus more detailed, the website is categorized with several relevant, but more refined categories:
– Consumer Electronics
– Computing
– VR/AR
This is because Apple sells categories from different verticals, e.g. IPhone belongs to first category whereas Macbook laptops are part of Computing.
We can thus see that an important part of domain categorization is the selection of categories by which we want to classify our content. Although one can set up own custom definitions of categories, also known as taxonomies, a quicker way is to use already established ones.
What is taxonomy, how many there are and what they are useful for?
Among these, one is the already mentioned and used IAB taxonomy from above. You can learn more about it here: https://iabtechlab.com/standards/content-taxonomy/
When trying to decide on taxonomy to adopt, you should ask yourself – why do I need classification and for that field/domain/area?
If it is for classifying general texts or ads, then IAB is a good choice. However, if you are categorizing domains that are from Ecommerce sector, e.g. Shopify stores, then IAB is not the best choice, as it does not have that many Ecommerce categories.
In this case, it is better to opt for Google Products Taxonomy:
https://www.google.com/basepages/producttype/taxonomy-with-ids.en-US.txt
Another alternative is also Facebook Products Taxonomy.
If you however have an own list of categories for which you need to categorize domains, then you can also select the classifier development service from our www.websitecategorizationapi.com. One of our specialties is that we are able to train highly accurate categorizers for custom defined list of categories.
All we need is a list of categories, we will then produce the relevant texts for each category and train the machine learning classifier on this data set. Finally, we provide you with the API access to this custom classifier.
What is domain categorization used for?
Domain categorization can have many useful purposes and can be used for widely different tasks.
One example is web content filtering. Imagine a company that does not want that their employees use their work time to shop, browse social media or watch TV. The solution is to implement content filtering for browsers where each URL requested by the user is going through a web content filter.
Web content filter has a list of millions of domains, each categorized to categories like Shopping, Social Media, TV, etc. So it is then easy for content filter to block access to shopping, social media and other domains that the employer does not want employees to use during work time.
Another example is as providing additional support and information for platforms, apps and other online services. An ecommerce analytics platform that has various metrics for millions of online stores would like to provide search and filtering based on verticals.
The solution? Domain categorization of millions of online stores that the platform covers. This is one of many use cases that our platform www.websitecategorizationapi.com was used for by our clients.
How one does determine pricing for domain categorization?
The most common approach to determine pricing for domain categorization is per API request, where each request is classification of one domain. Some provides can also have different prices for simpler classifier models than for more complex classifiers.
API requests can be available as part of monthly plans, which is suitable for those that need domain categorization on regular basis, e.g. as part of new data obtained during business and other processes.
Price for development of custom domain categorization models is usually quoted as one-time, fixed price payment.
How many domains are there?
According to VeriSign:
there were 365 million domains registered in 2021. Most of the domains are usually .com, but there are also many top country level domains like .co.uk, .de, .fr, .es and others.
A good starting point to analyze domains is to download the 1 million domains from Alexa. Another interesting source is the 1 million domain list from open source project: https://tranco-list.eu/.
Domain categorization of a large number of domains
Our company has categorized millions of domains as part of its service. In this section, we would like to present results on a sample of domains categorized according to IAB1 and IAB2.
In the first analysis, we only consider the IAB1 classification. These are the possible IAB1 categories:
Automotive Hobbies & Interests Business and Finance Personal Finance Pop Culture Video Gaming Science Television Events and Attractions Fine Art Religion & Spirituality Education Books and Literature Medical Health Careers Sports Healthy Living Real Estate News and Politics Pets Shopping Food & Drink Movies Travel Music and Audio Style & Fashion Technology & Computing Home & Garden Family and Relationships
Here are the results of 822,178 domain categorizations:
We can see that the most frequent categories for domains are: Business and Finance, Sports, Personal Finance, Technology and Comptuing, Hobbies and Interests, Events and Attractions.
These are kind of the top players in terms of verticals.
Next, let us consider a more refined categorization, which uses Tier 2 classification, where we have 400+ categories.
We will not add all of them here, but as an indication are listing “car” or “automotive” related categories:
- Auto Buying and Selling
- Auto Type
- Auto Safety
- Auto Shows
- Car Culture
- Model Toys
- Auto Rentals
- Auto Technology
- Auto Parts
- Radio Control
- Auto Insurance
You can see that there are many different subcategories related to cars and automotive topic.
The most frequent category is again Business, followed by Arts and Crafts, Travel Type, Computing, Diseases and Conditions and so on.
The analysis shows that content of categories is widely distributed, filling many niches, with some expected categories as being most dominant and frequent.
Our service offers categorizations of millions of domains, both for Tier 1 and Tier 2. This offline database is especially useful when one needs integration in own applications where the latency of obtaining categorizations needs to be low, which is better achieved by having local access to it rather than fetching it over the internet, via either API or SQL queries.
We provide offline database of categorized domains both in txt formats or in SQL databases.
Regarding the language of domains, our categorization services support in addition to English, over 20+ additional languages.
Conclusion
In this article, we have provide information on the task of domain categorization – what it is, where is it used, what categories can we use and some general information about domains.
If you need free domain categorization, you can check out our service at: https://www.websitecategorizationapi.com/demo_dashboard_iab/index_url.php.