RosaeNLG and ChatGPT
RosaeNLG is an open-source Natural Language Generation (NLG) library, and a sandbox project of the LF AI & Data. I am the original author of RosaeNLG, and a current contributor.
Since ChatGPT was released end of 2022, I am regularly asked about the sustainability of RosaeNLG. Said bluntly, in view of ChatGPT’s power and versatility, does RosaeNLG still makes sense?
What is RosaeNLG?
RosaeNLG is a template-based language generator. Its input is structured data (e.g. a financial situation) - not text. It must be configured specifically for each use case with templates which define precisely the texts to generate. These templates can require an extensive setup and fine-tuning effort: it is not uncommon to spend dozens of days on it for a new use case.
The NLG engine — RosaeNLG — takes the input data and the templates to produce the texts. It uses various algorithms to do so, and also relies on language-specific grammatical rules and linguistic resources to manage conjugations and agreements, but it does not use Machine Learning.
Many natural language generators used in production today function more or less the same: e.g. CoreNLG (open source), Arria NLG, or Yseop, etc. (commercial).
What is ChatGPT?
Using a well-chosen prompt, any non-technical user can ask ChatGPT to quickly generate a sophisticated text. Thanks to its underlying LLM (Large Language Model), ChatGPT has an out-of-the-box extensive knowledge of the world, which it uses to generate outstanding texts with limited efforts from the user (which is very positive).
So what are the differences?
ChatGPT excels at text-to-text: the input is text (the prompt), and the output is text as well. But ChatGPT is not suitable as of today for data-to-text use cases, as described here. It often hallucinates and invents facts. More generally, the user cannot blindly rely on its output, and has to check each output for correctness against the input data.
RosaeNLG excels at data-to-text: the input is structured data, and the output is text (it is not able to do text-to-text at all). Through the templates, the user has strict control over the generated texts: RosaeNLG will not hallucinate nor invent facts. In practice, if the templates are properly written and tested, the output can be trusted against the input data.
Also, ChatGPT requires extensive computing resources, and can only be used through its paying API. Contrariwise, RosaeNLG is lightweight: it can run anywhere JavaScript can run, from a server (node.js) to a laptop or a smartphone (using the JavaScript engines embedded in browsers).
At last, ChatGPT is proprietary while RosaeNLG is open source.
So will RosaeNLG continue?
As of today, there is no Machine Learning based alternative to tools like RosaeNLG for data-to-text production use cases. So I will actually continue to maintain and extend RosaeNLG.
Please consider contributing!