Data science is a field of study focused on statistical and mathematical modeling techniques to derive knowledge from data sets which are typical large and/or unstructured. The techniques take the form of machine learning, predictive modeling, probability models, and even artificial intelligence algorithms.
You may have heard that business are hiring data scientists to crunch their data but many people find themselves asking, “Do I need to use data science in my organization?”. With the non-linear growth in organizational data, that answer is increasingly becoming a qualified yes, with an asterisk. Many organizations will see benefit from using data science techniques but the key is to use the right techniques for your organization. There is no one size fits all solution in data science, each business needs to evaluate their needs and what benefits data science techniques may offer as compared to more conventional business intelligence processes and analytics.
Torture the data, and it will confess to anything.Ronald Coase, Economics, Nobel Prize Laureate
For an organization interested in incorporating data science processes there is no clear path without a significant investment. Hiring on a full time data scientist that can deliver results will run you well into the six figures, large consulting firms are not interested in smaller projects and will pressure you into large contracts; both of these options are not suitable for an organization looking to pilot data science processes or slowly incorporate them. This is where MBATech can help.
MBATech takes a small firm approach to bringing data science into your organization by using business technology professionals to evaluate your needs, then leveraging a strong team of analysts holding Masters and PhD credentials in quantitative sciences. The actual implementation of data science can get complicated quickly, that is why we abstract the science from the business to provide actionable knowledge, backed by analytics.
All of our Data Science services are based on Data Mining, Machine Learning, and Statistical processes that facilitate knowledge discovery and decision support. Before we can begin any specialized Data Science activities many data manipulations and cleansing activities must take place in order to ensure we have the data in the proper format. These services are listed under our Business Intelligence and Data Services pages. Our Data Science services are closely related to our Business Intelligence services with the key difference being the level of sophistication employed in the analysis of data.
In the sections below we outline some of the more popular Data Science techniques we employ to extract knowledge from data.
Using advanced algorithms we try to predict for each individual in a population which set from a small number of classes they belong to. A real world example of this would involve predicting if a specific customer would respond to a specific marketing offer. Once the algorithm has been developed, trained, and tuned it can be used in real time to deliver an enhanced customer experience or increase sales, etc.
Whereas Classification only looks to determine which class an individual belongs to, Scoring looks to produce estimates of the likelihood that this individual belongs to each class. Classification is a binary process where the result is typically a yes or no answer, Scoring gives you a likelihood. This is useful for fine tuning marketing campaigns where the cost to include additional receipts is low.
Regression analysis can also be called value estimation, this process involves placing an estimate or prediction on a variable. A real world example for this process would be using it to answer the question “If I offer customer X a discount of 10% off widgets, how many more will they buy?” This type of analysis is extremely useful for sales and marketing budgeting processes as you can play out what-if scenarios to clear stock, promote a new service, increase market presence, etc.
Similarity matching attempts to identify similar individuals based on characteristics or behaviours. This type of analysis is very useful for segmenting customers and can be very beneficial when launching new products. A real world example would involve identifying low volume customers who have similar buying patterns (timing and product selection) as very profitable, high volume customers. These customers could potentially be targeted for marketing activities aimed to increase their purchase volumes or you could offer them products that the high volume customers are purchasing which they are not.
Link prediction involves predicting connections between data items, usually by suggesting a link should exist. Social media services such as Facebook, Linkedin, Twitter , etc. suggest links between people based on your current social network, the same techniques can be applied to other types of data. A great example is online movies, you can attempt to predict movie preferences based on the individuals viewing history. Netflix uses this technology very heavily.
Causal modeling attempts to find what events or actions actually influence others. For example, consider that we use predictive modeling to target advertisements to consumers, and we observe that indeed the targeted consumers purchase at a higher rate subsequent to having been targeted. Was this because the advertisements influenced the consumers to purchase? Or did the predictive models simply do a good job of identifying those consumers who would have purchased anyway?