OpenAI allows language to write coherent texts but fears abuse




non-profit OpenAI has trained a language model that for example writes coherent texts by always predicting the next word, based on all previous words in a text. The researchers do not explain the model, for fear of abuse.

OpenAI calls its model GPT-2 . The language model has 1.5 billion parameters and is trained on a dataset of eight million web pages. According to the researchers, GPT-2 scores better on tasks for language models than models that are trained on specific domains. The researchers did not choose to base their dataset on news items, Wikipedia lemmas or books only, in order to keep their data set as large and diverse as possible.

Instead, they scraped all outgoing links from Reddit who had received at least three karma. “This can be seen as a heuristic indicator that other users found the link interesting, instructive or just funny,” they write in their white paper . Wikipedia pages were taken out because they are often used for other datasets. The result was a text file of 40GB that they call Webtext.

By training their language model on this they came up with a model that can be used for many tasks across different domains. They give an example of answering questions, making summaries and providing translations, where the advantage is that the model learns this on the basis of rough text instead of specific training data.

OpenAI demonstrates their language model with the writing of various texts with the aim that simply the next word should be predicted on the basis of a given text: the basis is always a short text written by people, which continues the model. In doing so, the model takes over the style and content. That does not always work well, the researchers admitted, and especially in technical subjects the model does not perform well, but in many other cases and sometimes after several attempts, the synthetic texts are realistic articles. By training GPT-2 on specific datasets, the model can be fine-tuned. OpenAI gives as an example the writing of reviews by training on Amazon Reviews.

Against the Guardian, Jack Clark says of the nonprofit that the trained model is not released to first clarify what it can and can not do. “There are many more people than us who are better at thinking about what kind of malicious things it can do.” Instead, OpenAI launches a smaller model on GitHub with which researchers can experiment.

OpenAI is an organization that focuses on research into the responsible use of artificial intelligence and is supported by, among others, Microsoft, Nvidia, GitHub and Elon Musk.



In: A Technology & Gadgets Asked By: [23225 Red Star Level]

Answer this Question

You must be Logged In to post an Answer.

Not a member yet? Sign Up Now »