THE POLITICAL CORRUPTION EVIDENCE | FACEBOOK META CORRUPTION | Facebook builds a data-set to create fake dating interactions and automatic spy agents

Facebook builds a data-set to create fake dating interactions and automatic spy agents

Facebook builds a data-set to create fake dating interactions and automatic spy agents

Facebook researchers build a dataset to train personalized dialogue agents
Persona-based network architecture. Credit: Mazaré et al.

Researchers at Facebook have recently compiled a dataset of 5 million personas and 700 million persona-based dialogues. This database could be used to train end-to-end dialogue systems, resulting in more engaging and rich dialogues between computer agents and humans.

Dialogue systems, or conversational agents (CA), are computer systems designed to communicate with human beings via text, speech, graphics, or other methods, in a coherent way. So far, dialogue systems based on neural architectures, such as LSTMs or memory networks, have been found to be particularly promising in achieving fluent communication, particularly when trained directly on dialogue logs.

"One of their main advantages is that they can rely on large data sources of existing dialogues to learn to cover various domains without requiring any expert knowledge," the researchers wrote in their paper, which was pre-published on arXiv. "However, the flip side is that they also exhibit limited engagement, especially in chit-chat settings: They lack consistency and do not leverage proactive engagement strategies as (even partially) scripted chatbots do."

In a recent study, a different team of researchers at Montreal Institute for Learning Algorithms (MILA) and Facebook AI created a  called PERSONA-CHAT, which includes dialogues between agents with text profiles, or personas, attached to them. They found that training a dialogue system on a particular persona improved their engagement in interactions.

"However, the PERSONA-CHAT dataset was created using an artificial data collection mechanism based on Mechanical Turk," the researchers explained in their paper. "As a result, neither dialogs nor personas can be fully representative of real user-bot interactions and the dataset coverage remains limited, containing a bit more than 1k different personas."

To address the limitations of the previously compiled dataset, the Facebook researchers created a new, large-scale persona-based dialogue dataset, composed of conversations extracted from online platform Reddit. Their study takes the work of their predecessors one step further, by using more representative interactions.

"In this paper, we build a very large-scale persona-based dialogue dataset using conversations previously extracted from Reddit," the researchers wrote. "With simple heuristics, we create a corpus of over 5 million personas spanning more than 700 million conversations."

To evaluate its effectiveness, the researchers trained persona-based end-to-end dialogue systems on their newly developed dataset. Systems trained on their dataset were able to conduct more engaging conversations, outperforming other conversational agents that did not have access to personas during their training.

Interestingly, their dataset led to state-of-the-art results even when systems were merely pre-trained on it. In future, these findings could lead to the development of more engaging chatbots, which can also be personalized and trained to acquire a particular persona.

"We show that training models to align answers both with the persona of their author and the context improves the predicting performance," the researchers wrote. "As pre-training leads to considerable improvement in performance, future work could fine-tune this model for various ."

 Explore further: A neural network to extract knowledgeable snippets and documents

More information: Training Millions of Personalized Dialogue Agents. arXiv:1809.01984 [cs.CL].

Personalizing Dialogue Agents: I have a dog, do you have pets too? 

Gallery RSS RSS Feed | Archive View | Powered by Zenphoto | rotections: Public Domain. Non-Commercial. Fair Use. Freedom of The Press. No Tracking Of Public Allowed. First Amendment Protections, SLAPP, UN Protected. GDPR Compliant. Section 203 protected. Privacy Tools At:, ACLU, ICIJ- supported. If you sue us to try to hide and censor the news, you are allowing us to bypass the demurrer process, and we will counter-sue you for RICO, Anti-trust, Political Bribery, Sex Trafficking, Interference, First Amendment and your other crimes, which we have FBI-grade evidence for! Bring it on corrupt Tesla, Google, Facebook, Youtube, Netflix! We might even get DOJ and/or FTC to partner with us (again) to take your filthy corrupt companies down!....REAL NEWS does not have ads in it. Any news source, with ads in it, is fake news manipulated by the advertisers!....DOWNLOAD AND COPY THIS NEWS SITE. USE ANY FREE SERVER SPACE YOU FIND ON THE WEB. MAKE YOUR OWN DIGITAL NEWSPAPER. JOIN THE HUNDREDS OF THOUSANDS OF FREE NEWS SITES, LIKE THIS, AROUND THE WORLD AND DELIVER THE NEWS AND DEMOCRACY. PLEASE FOLLOW THE WIKIPEDIA RULES FOR POSTING. BE THE NEWS!