In our last post, we began by trying to create a model for veracity, and ended with the idea of creating a model for intention using the syntax of sentences. In this post, we are going to start looking into the particulars for a model of intention using syntax.
With quick reflexes and a fortunate server error, I was lucky enough to get a ticket to the 2018 Neural Information Processing Systems Conference (NeurIPS). It was with great excitement that I attended to represent Forge.AI this year. NeurIPS provides its attendees with a week of talks, demonstrations, and incredible networking opportunities. I was able to catch up with old friends, and meet new friends and potential collaborators. For those of you who weren’t lucky enough to score a ticket, I thought it would be useful to provide a collection of highlights from the conference.
Forge sources unstructured data and transforms it into structured event streams. These event streams are enriched with knowledge during the transformation process and delivered in real-time to describe the happenings of the world in a machine-ready way. Our structured event stream enables downstream AI systems to reason over our data, while enabling human analysts to develop and validate new hypotheses about the impact of specific events, or series of related events, in relevant domains.
We're excited to announce that will be starting a new series of posts on Veracity here at the Forge.AI blog. The series will explore quite a few aspects, including:
At Forge.AI, we capture events from unstructured data and represent them in a manner suitable for machine learning, decision making, and other algorithmic tasks for our customers (for a broad technical overview, see this blog post). In order to do this, we employ a suite of state of the art machine learning and natural language understanding technologies, many of which are supervised learning systems. For our business to scale aggressively, we need an economically viable way to acquire training data quickly for those supervised learners. We use natural language generation to do just that, supplementing human annotations with annotated synthetic language in an agile fashion.
Natural Language Understanding at an industrial scale requires an efficient, high quality knowledge graph for tasks such as entity resolution and reasoning. Without the ability to reason about information semantically, natural language understanding systems are only capable of shallow understanding. As the requirements of machine reasoning and machine learning tasks become more complex, more advanced knowledge graphs are required. Indeed, it has been previously observed that knowledge graphs are capable of producing impressive results when used to augment and accelerate machine reasoning tasks at small scales, but struggle at large scale due to a mix of data integrity and performance issues. Solving this problem and enabling machine driven semantic reasoning at scale is one of the foundational technological challenges that we are addressing at Forge.AI.
To understand the complexity of this task, it's necessary to define what a knowledge graph is. There are many academic definitions floating around, but most are replete with jargon and impenetrable. Simply said, a knowledge graph is a graph where each vertex represents an entity and each edge is directed and represents a relationship between entities. Entities are typically proper nouns and concepts (e.g. Apple and Company, respectively), with the edges representing verbs (e.g. Is A). Together, these form large networks that encode semantic information. For example, encoding the fact that "Apple is a Company" in the knowledge graph is done by storing two vertices, one for "Apple" and one for "Company", with a directed edge originating with Apple and pointing to Company of type "isA". This is visualized in Figure 1:
Suppose you’d like to classify individual documents at multiple levels of specificity. In addition, you’d also like to know whether a document contains multiple topics and with what confidence. For example, as I write this Google News is displaying an article titled Income Stocks With A Trump Tax Bonus. We may want to capture the main topics contained in the article along with an associated measure of our confidence that those topics are contained in the article. Such a classification might look something like the following
The human brain is a remarkable instrument with highly evolved regions for understanding, reasoning, and decision making. When humans communicate, they typically speak or write to directly convey information. Information can be transmitted through email, text message, phone, web page, social media etc. to desired targets. Effective communication enables the semantics of what is attempting to be communicated to be properly received and interpreted. Our brains have highly developed regions, such as the angular gyrus, Wernicke’s area and Broca’s area, focused on reading and speech comprehension and the application of information to reasoning and decision-making processes.