Oct

when client asks

Interested in reading more? Check out our other blogs:

Beware the lure of crowdsourced data

Crowdsourced data can often be inconsistent, messy or downright wrong 

We all like something for nothing, that’s why open source software is so popular. (It’s also why the Pirate  Bay exists). But sometimes things that seem too good to be true are just that. 

Repustate is in the text analytics game which means we needs lots and lots of data to model certain  characteristics of written text. We need common words, grammar constructs, human-annotated corpora  of text etc. to make our various language models work as quickly and as well as they do. 

We recently embarked on the next phase of our text analytics adventure: semantic analysis. Semantic  analysis the process of taking arbitrary text and assigning meaning to the individual, relevant components.  For example, being able to identify “apple” as a fruit in the sentence “I went apple picking yesterday” but to  identify “Apple’ the company when saying “I can’t wait for the new Apple product announcement” (note:  even though I used title case for the latter example, casing should not matter)

READ MORE