Jun

Beware the lure of crowdsourced data

Crowdsourced data can often be inconsistent, messy or downright wrong 

We all like something for nothing, that’s why open source software is so popular. (It’s also why the Pirate  Bay exists). But sometimes things that seem too good to be true are just that. 

Repustate is in the text analytics game which means we needs lots and lots of data to model certain  characteristics of written text. We need common words, grammar constructs, human-annotated corpora  of text etc. to make our various language models work as quickly and as well as they do. 

We recently embarked on the next phase of our text analytics adventure: semantic analysis. Semantic  analysis the process of taking arbitrary text and assigning meaning to the individual, relevant components.  For example, being able to identify “apple” as a fruit in the sentence “I went apple picking yesterday” but to  identify “Apple’ the company when saying “I can’t wait for the new Apple product announcement” (note:  even though I used title case for the latter example, casing should not matter)

Interested in reading more? Check out our other blogs:

Microsoft AI products

                                                 

Microsoft product strategy has always been and still remains that of ‘zero alternative’. Their ultimate policy is for their customers to have no choice but to embrace only Microsoft products. Consequently they created and are offering products and solutions in (almost) every segment of IT enterprise and consumer market, including, but certainly not limited to, their own data base, their own cloud services, operating system, office tools, programming language, and many more.

Not only do Microsoft offer wide variety of products, they tie them up together in a unified ecosystem that makes it easy for components to connect and interact. At the same time, this ecosystem is hostile to non-Microsoft products.

Microsoft strategy for the burgeoning, fast growing AI segment is similar:

Create products to address all parts of the AI market, add them to the ecosystem to ensure easy compatibility from within and difficulty of use from outside.

Currently the products on offer are:

- Microsoft AI engine, called LUIS. It is supposed to compete with other major industrial AI systems such as IBM Watson, and has similar training methodology. It offers webhook interfacing via endpoints.  

- Microsoft chatbot building platform, called, surprisingly, Microsoft Bot Platform. It addresses the popular demand for easy chatbot design and provides seamless connectivity with main user interfaces, such as web interface, SMS, mobile, and messaging platforms.

- In addition Microsoft offers their own messaging platform in Skype.

The main advantage of  using Microsoft AI products is the built-in connectivity with user interfaces.

The main disadvantage is in their ‘zero alternative’ policy - once you’ve chosen a Microsoft product you are likely will be forced to choose only Microsoft products for the duration of your project.

 

READ MORE

nmodes Technology - Overview

                                                       

nmodes ability to accurately deliver relevant messages and conversations to businesses is based on its ability to understand these messages and conversations. Once a system understands a sentence or text, it can easily perform a necessary action, i.e. bring a sentence about buying a car to the car dealership, or a complaint about purchased furniture to the customer service department of the furniture company.

Understanding sentences is called semantics. nmodes has developed a strong semantic technology that stand out in a number of ways.

Here is how nmodes technology is different:

1. Low computational power. We don’t use methods and algorithms deployed by almost everyone else in this space. The algorithms we are using allow us to achieve high level of accuracy while significantly reducing the computational power. Most accurate semantic systems, e.g. Google’s, or IBM’s, rely on supercomputers. By comparison our computational requirements are modest to the extreme, yet we successfully compete with these powerhouses in terms accuracy and quality of results.

2. Private data sources. We work extensively with Twitter and other social networks, yet at the same time we process enterprise data.  Working with private data sources means system should know details specific only to this particular data source. For example, when if a system handles web self-service solution for online electronics store it learns the names, prices, and other details of all products available at this store.  

3. User driven solution. Our system learns from user’s input. Which makes it extremely flexible and as granular as needed. It supports both generic topics, for example car purchasing, and conversations concentrating on specific type of car, or a model.

READ MORE