Artificial intelligence learns most by looking at Reddit data

Important language models that came to the fore with the emergence of ChatGPT later became an integral part of daily life with the introduction of various models such as Gemini, DeepSeek, Llama and Grok.

While the capabilities of the large language models in use are causing a stir around the world, the sources from which these models are fed have long been at the center of debate.

According to the data compiled, many large language models, including ChatGPT, use public websites to generate their responses.

Reddit is at the top with 40 percent

The online statistics portal Statista has examined which resources artificial intelligence language models use and how much. In the study conducted by Statista in June, reddit.com was at the top of the websites most cited by major language models with 40.11 percent in the first quarter of the year.

According to experts, the fact that artificial intelligence cites the platform Reddit, where real people's discussions on certain topics take place, shows that those who develop artificial intelligence models give priority to the natural conversations of real people over official information.

According to Reddit, the most important language models receive the most citations at 26.3 percent from Wikipedia, which is defined as an “internet encyclopedia”.

Wikipedia, which uses research-edited articles, seems far behind compared to Reddit data, which doesn't go through edit filters.

In the list, which indicates how often which language model displays which source and how often, YouTube comes in third place with 23.5 percent, Google with 23.2 percent, Yelp.com with 21 percent, Facebook with 19.9 percent, Amazon with 18.7 percent, Tripadvisor with 12.4 percent, Mapbox.com with 11.2 percent and openstreetmap.com with 11.2 percent.

Agreement between Google and Reddit to train artificial intelligence

On the other hand, agreements are being made between social media giants and artificial intelligence manufacturers for the training of artificial intelligence models.

According to the agreement reached between Google and Reddit in 2024, Google's artificial intelligence will also be powered by Reddit data.

According to Reuters news, Google will pay Reddit $60 million annually under the agreement. According to another report from Reuters, Reddit has entered into another data sharing agreement with the company OpenAI to be used in ChatGPT.

Thus, the Reddit effect in artificial intelligence responses has been reinforced by the agreements reached today.


Yayımlandı

kategorisi

yazarı:

Etiketler:

Yorumlar

Bir yanıt yazın

E-posta adresiniz yayınlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir