Swiss data services company Unit8 highlights the key analytics trends that we will see accelerating in 2022 in its “Advanced Analytic Trends Report”.
The report compiles feedback of industry leaders from Merck, Credit Suisse, and Swiss Re, on using mega models in top-tier companies.
Mega models (e.g GPT-3, Wu Dao 2.0, etc.) show impressive performance yet are extremely costly to train.
Only a few companies are able to compete in this space, nonetheless, the availability of these mega models opens the possibilities to new applications.
There is still a major challenge around quality control before these are broadly adopted in a business environment but they already assist developers in writing snippets of code.
Are pre-trained machine learning models like GPT-3 ready to be used in your company?
Large scale language models trained on extremely large text datasets have enabled new capabilities that could soon power a wide range of AI applications across businesses of all shapes and sizes.
The most well-known such pre-trained machine learning model is OpenAI’s ‘Generative Pre-trained Transformer Version 3’ (GPT-3) – an AI model trained to generate text.
Unlike AI systems designed for a specific use-case, GPT-3 provides a general-purpose “text in, text out” interface – so users can try it on virtually any task involving natural language and even programming languages. GPT-3 created a huge buzz when beta testing of the new model was announced by OpenAI, back in 2020.
The hype was justified based on the impressive first demos of GPT-3 in action.
It was writing articles, creating poetry, answering questions, translating text, summarising documents, and even writing code. In six months since OpenAI opened access to the GPT-3 API to third parties, over 300 apps using GPT-3 hit the market, generating a collective 4.5 billion words a day.
Deep learning requires vast amounts of training data and processing power, neither of which were easily available until recently.
Pre-trained models exist because due to time and computing power restraints, it’s simply not possible for any company to build such models from scratch.
That’s why many industry leaders think that the use of PTM’s like GPT-3 could be the next big thing in AI tech for the business landscape.
How does the technology behind GPT-3 work?
Pre-trained models (PTM) are essentially saved neural networks whose parameters have already been trained on self-supervised task(s), the most common one being predicting the text that comes after a piece of input text.
So that instead of creating an MLmodel from scratch to solve a similar problem, AI developers can use the PTM built by someone else as a starting point to train their own models.
There are already different types of pre-trained language models such as CodeBERT, OpenNMT, RoBERTa, that are trained for different NLP tasks.
What’s clear is that the AI community has reached a consensus to deploy PTMs as the backbone for future development of deep learning applications.
A language model like GPT-3 works by taking a piece of input text and predicting the text that will come after. It makes use of Transformers – a type of neural network with a specific architecture that allows them to simultaneously consider each word in a sequence.
Another significant aspect of GPT-3 is its sheer scale. While GPT-2 has 1.5 billion parameters, GPT-3 has 175 billion parameters, vastly improving its accuracy and pattern-recognition capacity.
OpenAI spent a reported $4,5 million to train GPT-3 on over half a trillion words crawled from internet sources, including all of Wikipedia.
The emergence of these “mega models” has made powerful new applications possible because they are developed through self-supervised training.
They can ingest vast amounts of text data without the need to rely on an external supervised signal; i.e., without being explicitly told what any of it ‘means’
Combined with limitless access to cloud computing, transformer-based language mega models are very good at learning mathematical representations of text useful for many problems, such as taking a small amount of text and then predicting the words or sentences that follow.
Scaled up mega models can accurately respond to a task given just a few examples (few-shot), or even complete ‘one shot’ or ‘zero shot’ tasks.
Will GPT-3 Change the Face of Business?
Equally impressive is the fact that GPT-3 applications are being created by people who are not experts in AI/ML technology.
Although NLP technology has been around for decades, it has exploded in popularity due to the emergence of pre-trained mega models.
By storing knowledge in mega models with billions of parameters and fine-tuning them for specific tasks, PTMs have made it possible to perform language tasks downstream like translating text, predicting missing parts of a sentence or even generating new sentences.
Using a PTM like GPT-3, machines are able to complete these tasks with results that are often hard to distinguish from those produced by humans.
In fact, in some experiments only 12% of human evaluators guessed that news articles generated by GPT-3 were not written by a human.
Sectors like banking or insurance with strict regulations might always feel the need to keep a human in the loop for quality control.
However, any task that involves a particular language structure can get automated through pre-trained language models.
GPT-3 is already being used for tasks related to customer assistance, information search, or creating summaries.
Gennarro Cuofano, curator of the FourWeek MBA, lists a number of commercial applications that can exploit the potential of PTM’s like GPT-3 to automate mundane tasks, including:
- Automated Translation: GPT-3 has already shown results that are as accurate as Google’s DeepMind AI that was specifically trained for translation.
- Programming without Coding: By applying language models to write software code, developers could automatically generate mundane code and focus on the high-value part. Examples include using GPT-3 to convert natural language queries into SQL.
- Marketing Content: In the Persado 2021 AI in Creative Survey, about 40% reported using AI to generate creative marketing content. Content marketing and SEO optimisation is just the start. Future use cases include building apps, cloning websites, producing Quizzes, Tests, and even Animations.
- Automated Documentation: Generating financial statements and other standardised documents like product manuals, compliance reports etc., that require summarisation and information extraction. OthersideAI is building an email generation system to generate email responses based on bullet-points the user provides.
The use of those models becomes more and more democratised as there are more tutorials, tools and libraries such as huggingface but it still takes effort, expertise and enough data to fine-tune properly those pre-trained models.
The Future of Pre-Trained Machine Learning Models
To evaluate how ready PTM based services are to be used by your company, there are some limitations to consider.
As some experts have pointed out, mega models like GPT-3 are not an artificial “general” intelligence. It does lack a large amount of context about the physical world.
PTM’s like GPT-3 therefore have limitations related to the quality of input prompt text, but users can improve GPT-3’s abilities with better “prompt engineering”.
It is also possible to fine-tune mega models on new datasets, and the real potential of pre-trained models will be as an enabling technology for products that customise these models through methods known as transfer learning.
The next big challenge for the NLP community is to get better at understanding human intention.
Already, InstructGPT, the latest version released by OpenAI, is said to be better aligned with human intention since it is “optimised to follow instructions, instead of predicting the most probable word.”
It is also expected to be 500 times the scale of GPT-3.
What is certain is that the technology will become only more powerful. It’ll be up to us how well we build and regulate its potential uses and abuses.
The post AI Industry Leader Unit8 Launches Advanced Analytic Trends Report 2022 appeared first on Fintech Schweiz Digital Finance News - FintechNewsCH.
Comments