Nov

Towards smarter data - accuracy and precision

                                                   

There is a huge amount of information out there. And it is growing. To make it efficient and increase our competitive advantage we need to evolve and start using information in a smart way, by concentrating on data that drives business value because it is accurate, actionable, and agile. Accuracy is an important measure that determines the quality of data processing solutions.

How accuracy is calculated?

It is easy to do with structured data, because the requirements are formalizable. It is less obvious with unstructured data, e.g. a stream of social feeds, or any data set that involves natural language. Indeed, the sentences of natural language are subject to multiple interpretations, and therefore allow a degree of subjectivity. For example, should a sentence ‘I haven’t been on a sea cruise for a long time’ be qualified for a data set of people interested in going on a cruise? Both answers, yes and no, seem valid.

In these cases an argument was put forward endorsing a consensus approach which polls data providers is the best way to judge data accuracy. This approach essentially claims that attributes with the highest consensus across data providers is the most accurate.

At nmodes we deal with unstructured data all the time because we process natural language messages, primarily from social networks. We do not favor this simplistic approach, as it is considered biased, inviting people to make assumptions based on what they already believe to be true, and making no distinction between precision and accuracy. Obviously the difference is that precision measures what you got right, and accuracy measures both what you got right and what you got wrong. Accuracy is a more inclusive and therefore more valuable characteristic.

Our approach is

a) to validate data against third party independent sources (typically of academic origin) that contain trusted sets and reliable demography. Validating nmodes data against third party sources allows us to verify that our data achieves the greatest possible balance of scale and accuracy.

b) to enrich upon the existing test sets by purposefully including examples ambiguous in meaning and intent, and providing additional levels of categorization to cover these examples.

Accuracy is becoming important when businesses move from rudimentary data use, typical of the first Big Data years, to a more measured and careful approach of today. Understanding how it is calculated and the value it brings helps in achieving long-term sustainability and success.

 

Interested in reading more? Check out our other blogs:

WHY ALL CONVERSATIONAL AI SOLUTIONS ARE CURRENTLY CUSTOM MADE

                                                                                                                                         

All quality conversational AI solutions such as chatbots, voice bots, virtual assistants are customized. The reason is because conversational AI solutions have a component called AI training that has to be individually tailored to the needs of each business. Currently AI industry does not have a suitable solution to automate this component.


There are, of course, easy-to-use, scalable products such as Chatfuel, ManyChat and others, but they do not provide sufficient quality and therefore do not add value to the professional sales or customer service process.


The next generation of conversational AI solutions will be scalable, while capable of delivering the level of quality required by businesses and professional organizations. nmodes is among a limited number of AI companies, with sufficient level of technological knowledge and deep enough understanding of underlying linguistic processes. working on delivering this kind of solution to the market as quickly as possible. In the meantime, customizable AI solutions, with personalized AI training component, is industry's best option.  

 
READ MORE

Abundance of Information Often is a Liability

A massive change has occurred in the world during the last ten to twenty years. Until recently and throughout the history of mankind information was hard to access. Obtaining and sharing information was either a laborious process or impossible, and the underlying assumption was that information can never be enough.

Today, of course, we have the opposite picture. Not only information is easily available, it keeps pouring in from a growing number of sources, and we continuously find ourselves in situations when there is more information than we want or able to process.

A major task we, as species, are facing is therefore how to reduce or filter out relevant information. It is, to repeat, in direct opposition to the task we’ve been accustomed to during all previous centuries, which was how to obtain information.

Since this change took place only recently, within a lifetime of one generation, we didn’t have time to develop efficient set of procedures to address the new problem. But the work has started and will only accelerate with time.

READ MORE