Big Data science: A game changer for data centric applications
Aug 4, 2014, Authors: Ashwin Kumar T, CTO SpotDy, Inc & Amar Rapaka, CFO SpotDy, Inc
Data sciences and its importance.
Over the last decade, data centric applications (ie applications that deal with data like e-commerce applications, social media sites, content management applications etc) produce, store and process zeta bytes of data every day. Coupled with the fact that the cost of storage and computing power is dipping drastically in the recent times, data is growing exponentially. Unless, applications have the capability to extract valuable and action-able information out of their internal and public data sources, they can't be intelligent and exhibit smarter behavior that enriches their user experience
Data science is the practice of extracting the intelligence and valuable insights from structured, unstructured and semi-structured data using mathematics, probability models, machine learning etc. With the growing need for apps to be more intelligent and to be more engaging with their users, which is largely possible with the integration of data sciences in to their apps, there is a significant investment in creating more and more data science models. Some data science models include sentimental analysis, profanity filters, spam filters, image processors, recommendation engines etc. Some example applications and companies, which can reap in the benefits of the data science are listed below:
An ECommerce application can make use of data science technology such as recommendation engine for showing recommended products that its users are likely to buy, sentiment analysis engines to know the users sentiment on particular product
A social media or content management applications can employ profanity filters to avoid profanity of the user's posts on their website
Data sourc'ing companies can better classify and conceptualize their data with the help of Concept tagging data science models so that they can realize better monetizing benefits while selling the data to their vendors.
Challenges in integrating data sciences
Building data science teams is an expensive operation. It requires hiring qualified data engineers, data scientists, software developers, security analysts, database adminstrators etc and spending significant amount of time with the team to build the data science models. On an average, it requires spending close to around 1000k USD/year to build simple data science solution. Typically, small and medium size companies can't afford to invest such money, time and effort in building those data science teams.
The Other option for small and medium sized companies is to approach niche players who provide solutions for a specific domain and a problem. Again the problem is, its a very time consuming process for the application developers to understand the APIs, integrate them in to apps, validating etc with the added pain of possible long term contracts with the niche players. If the data science solution does not meet the required objectives, companies have to repeat the whole of cycle of approaching an other niche player, understanding the APIs, integrating, validating etc. This can quickly become a viscous and a time consuming cycle if companies need a suite of data science solutions. In addition, with the app data growing exponentially, data solutions should be capable of providing scalability to apps and the niche data solutions possibly fail to provide the same to apps.
This certainly calls for an easier and flexible interface to suite of data science solutions so that app developers can quickly iterate and do A/B test different solutions in no time, consequently choose a data science solution that not only suite their their requirements but also provide scale on demand capability.
Role of SpotDy in countering challengesSpotDy realizes the challenges involved in integrating data sciences in apps and counters them as given below
Ease of Use:
SpotDy provides an easy to use REST based interface to a suite of data science solutions. It greatly reduces the turn-around time to integrate the solutions in to apps. Also, developers can switch to a more relevant data science solution with minimal code changes.
SpotDy platform is 100 percent cloud based and has highly distributed architecture built on the top of hadoop framework. It has the capability of auto scaling to the customers demands with a click of a button.
Choice of data science solutions:
SpotDy offers a suite of data science solutions catering to different problems such as sentiment analysis, profanity filters, spam filters, recommendataion engines, image detection, concept tagging, taxonomy etc. In otherwords, it serves as one stop shop for a variety of data science solutions.
Pay as you go, no long term contracts:
SpotDy cloud based services are strictly pay on go with no long term contracts.
A/B test multiple models:
SpotDy offers a framework for the developers to test multiple data science models on their sample data at the same time, outputs the results at the same time. This enables the users to compare and contrast various solutions and choose the one that fits their needs. In other words, SpotDy allows developers A/B test multiple models with ease.
Ability to slice/dice processed data:
SpotDy offers a value added service that provides sequelish interface to the processed data which is stored in No-Sql data stores. This enables app developers to slice and dice the data, get better insights from it.