How to be a Data Scientist Using SAS
To be a Data Scientist often requires a combination of “Hacking Skills, Math & Statistics Knowledge and Substantive Expertise” (Conway 2010). There are many programming languages to implement these skills chosen by people that enter into the field based on their background. One of the more common ones and the platform that provides the greatest breadth and depth is SAS. SAS is one of the first real statistical analysis languages and their heavy investment in research has kept them on the leading edge of analytics. They have not just one analytical path, but a number of them, giving the Data Scientist the freedom to approach their role in the manner that fits them best. The Data Scientist who has SAS in their tool belt is able to provide great contributions to their company’s bottom line. The three high-level steps in the analytics process are data, analysis and presentation. Some of the highlights of the SAS platform for each are explained below.
“The Data Scientist who has SAS in their tool belt is able to provide great contributions to their company’s bottom line”
Some of the excitement people have about data is in its size and sources. SAS has long had the capabilities of handling Big Data, with their Access products that connect to platforms such as Teradata and Greenplum. Our clients with SAS Grid can process large amounts of data in parallel immensely reducing their analysis time. The Hadoop platform with all its enabling accessories (Hive, Yarn, Pig, etc.) has been embraced by SAS and the SAS Data Scientist has a number of ways to work with and in the Hadoop platform, even distributing SAS processes across the different Hadoop nodes.
It’s also the variety of data that makes the new data paradigm so rich. The SAS Data Scientist can easily pull data from the web and social media with features as traditional and powerful as the FILENAME or as new and even more powerful as SAS Social Media Analytics. The Data Management Solution—Data Integration, Data Quality, Data Governance, Master Data Management and Data Federation—can be used both across the enterprise to manage the flood of internal and external data, and within the creative ad hoc data hacking for new data sources.
SAS is the continual leader in advanced analytics ever since its inception (Gartner, 2014). It gives the Data Scientist everything from complex A/B testing, logistic regression for targeted marketing, predictive modeling, forecasting, network analysis, text analytics, simulations and optimization. SAS also gives access to external software so Data Scientists can perform their own research into custom analytics that are developed to solve new and unique business problems. The SAS Enterprise Miner platform does many of these analytics along with the bagging and boosting (ensemble methods) that Data Scientists use to combine the results of different machine learning methods to obtain better performance than any of the individual models.
One of the Data Scientists' responsibilities is to directly contribute to the bottom line. Being able to build models is great, but if it can’t be done in a scalable manner that impacts the daily decisions of the business leaders, then it doesn’t allow them to be competitive. Optimized procedures built by SAS R&D – SAS High-Performance Analytics (HPA)—give the Data Scientist the Big Analytics needed, e.g. sales forecasting on tens of thousands of SKUs in short periods of time.
The Social Media Analytics solution gives the Data Scientist ready access to the web and social media for text analytics, sentiment analytics and content categorization.
Another traditional and powerful tool for the Data Scientist is JMP. One of the variations on the Data Scientist role is one more interested in the business and less interested in the coding. While many SAS products make this possible (Enterprise Guide and Enterprise Miner both do), JMP takes this to an even greater level. Meant for the ad hoc, exploratory analyses that are often a large part of the Data Scientists day, JMP puts advanced machine learning, statistical methods and interactive visualizations at their fingertips Finally, Visual Analytics and Visual Statistics, hot off the SAS R&D presses, give the best of many worlds, though not all. It allows even the MBA Data Scientist to quickly perform complex exploratory analyses and some statistics or machine learning on Big Data and then seamlessly share those results with others.
The Data Scientist creates two types of products. One is the analytical models used on a large scale for internal or external production. The other is the insightful, often interactive, visualization that conveys to the layman or the decision maker the story hidden in the data. The Data Scientist again has the right tools available to them with the SAS platform. For example, stored processes with SAS/Access solutions translate the statistical models to production environments for enterprise level impact of their results. JMP (to a degree), Visual Analytics and Visual Statistics (more so) both create advanced interactive visualizations that can be shared with their audiences.
Even the standard visualizations within SAS can be productively efficient. Instead of using time intensive, point-and-click interfaces to make even simple changes to many dashboard items like you might find in other platforms, the data-driven programming techniques utilized by SAS users means quick turnaround of changes to production.
Beyond the Platform
The world of Data Science is specific and ambiguous, idealistic and revenue driven, passionate and fact- based. It’s not just the analytical platform that makes a Data Scientist successful, no matter how powerful, broad and deep it is. It requires a person who is driven to solve problems, passionate about contributing to the organization and deeply committed to the relationships of people and processes. SAS may be the best tool for the Data Scientist because of its breadth and depth and the Data Scientist with that same breadth and depth will be the best for your organization.