The world of modern information technology does not stand still, but is constantly changing and evolving. Seemingly completely new developments and programs are being replaced by others that are more modern, more interesting, more convenient, and faster.
Not long ago, researchers at Stanford University, in close collaboration with scientists at the Canadian MILA Institute for Artificial Intelligence, presented to the public an experimental and fundamentally new AI algorithm called Hyena.
Hyena is still undergoing various types of testing to ensure that users get a truly high-quality product. The first tests of artificial intelligence capabilities have shown that it can be a breakthrough in this area.
The developers of the program have already conducted a large number of tests, one of which was The Pile. It consists of a large volume of texts of various genres and directions, the total volume of which is approximately 825 GB, which is equivalent to 250000 books. This collection was created three years ago by Eleuther.ai, a non-profit organization that develops and tests artificial intelligence. All texts were downloaded here from professional sources such as: PubMed, GitHub or the US Patent Office.
The principle of operation and innovation of new artificial intelligence
Before the creation of Hyena
GPT-4 is based on a principle characterized by certain technical limitations, i.e., AI performance decreases depending on the amount of information to be analyzed.
This deep learning model is called Transformer. It uses an “attention” mechanism that separately weighs the importance of each part of the data it receives at the entrance.
“Attention” in the context of neural networks imitates human cognitive attention, meaning that AI selects important data for input and thus allocates more computing power to it.
In this way, the Transformer program uses “attention”, which takes information, for example, words that belong to the first group of symbols and moves them to the second group of symbols that generate the correct answer.
The transformer works using the quadratic principle of computational complexity, i.e. the program’s running time increases in proportion to the square of the amount of input information.
After creating Hyena
After analyzing all the pros and cons of the current principles of attention that AIs operate on, Michael Poli and his colleagues from Stanford University proposed to change the principle of attention. They reported this in a scientific article “Hyena Hierarchy: Toward More Cohesive Language Models”.
They used a new approach to processing information data. Michael Poli reduced the dependence of his algorithm on quadratic calculations by replacing the “attention” operation with a “convolution” based on a filter that selects data elements regardless of their origin. These can be pixels, digital photos, or words that are converted into sentences and coherent text.
Thus, the developers have combined two technologies – collapsible filters for the correct sequence of words and resizable filters.
Accordingly, you can use a convolution many times in a row up to an unlimited amount of text. No need to copy additional data.
During testing, it was found that the experimental version of Hyena achieved almost the same number of responses as GPT, but used significantly less input data.
The developers also noticed that as the input data increased, the performance of the new artificial intelligence algorithm increased compared to “attention”.
So, the researchers believe that Hyena’s new ability to use a filter that can effectively “stretch” over thousands of words means that the “context” of a query to a language program is virtually unlimited.