“Turkish” version of artificial intelligence is being developed – Latest News

The development and spread of artificial intelligence is advancing at an unprecedented pace, outpacing other technological advances, particularly in areas such as generative artificial intelligence and large language models.

This situation is predicted to have a significant impact on existing paradigms and solutions and reduce the effectiveness of traditional artificial intelligence solutions, thereby weakening the competitiveness of technology providers offering solutions in this area.

It is crucial for the National Technology Move to effectively use productive artificial intelligence in Turkey and to have an ecosystem that develops these technologies and achieves independence from foreign countries in its field in the event that the world's major technology companies the solutions they develop and produce monopolize countries that depend on this sector.


Turkish resources in the field of artificial intelligence are limited

The source language used by the artificial intelligence is also crucial for the cultural impact. The risk that language models contain biases and that foreign biases enter the culture through these models increases the importance of studies on this technology.

When training large language models that are widely used around the world, inadequate inclusion of Turkish is considered one of the major risks. While Turkish does not find a place among the first 16 languages ​​in Meta's model, Turkish resources are only used 0.16 percent when training the OpenAI model.

What is noteworthy is that the codes written in Chat GPT are mostly from Anglo-Saxon languages, and the worldview of this culture is presented to users in the answers of artificial intelligence and the information it provides.

Therefore, children's interaction with these language models risks introducing them to many elements not included in Turkish culture, customs and traditions and becoming part of cultural degeneration.


TÜBİTAK's model will improve the artificial intelligence repertoire

At this point, the “Turkish Major Language Model” studied by TÜBİTAK BİLGEM is of strategic importance. The institution stands out from others in that it is the first and only institution to have developed a “basic model” in this area.

This prepares a model for use that not only speaks Turkish well, but also conveys Turkish culture and sensitivity.

A base model is defined in the field of artificial intelligence as a model that has been pre-trained on a large data set and has learned the general language structure and usage of words and sentences.

This model is trained on data that includes broad coverage of a specific language or multiple languages. For example, a basic Turkish model can be trained with data such as Turkish texts, books, articles and more from the Internet, while during this training process the model expands its vocabulary by learning the basic rules of the language and grammar.

Thanks to the “Turkish Big Language Model”, artificial intelligence enriched with Turkish data including Turkish customs and traditions will master Turkey's sensibilities and help prevent cultural degeneration that may occur among the young generation through new technologies and applications .


A “tokenizer” specifically designed for Turkey was developed

In order to further develop the Turkish major language model, efforts to create a data pool of Turkish texts from the Internet and digital sources will continue.

This project is working on open source large language models. To create a high-quality Turkish language model, a pre-processing phase that took into account the intricacies of Turkish was carried out and the appropriate deep learning architecture was selected.

In addition, a Turkish-specific “tokenizer” was developed that enables effective use of these large open source language models in Turkish. After determining the number of parameters of this architecture and the ratio of data to be used, model training began.

While the training process is closely monitored as part of the studies, the focus is on optimizing the model by evaluating it against various success metrics in different areas of natural language processing (such as question/answer, summarization, language generation, text classification).

With the measures taken, the artificial intelligence, whose Turkish language is developed and aware of Turkey's sensitivities, is also intended to help prevent the cultural corruption that may occur in the young generation.


Yayımlandı

kategorisi

yazarı:

Etiketler:

Yorumlar

Bir yanıt yazın

E-posta adresiniz yayınlanmayacak. Gerekli alanlar * ile işaretlenmişlerdir