Data science is a blend of various tools, algorithms, & machine Learning principles with the goal to discover hidden patterns from the raw data.
The data science is all about : -
asking the correct questions & analyzing the raw data.
modeling the data using various complex & efficient algorithms.
visualizing the data to get a better perspective.
understanding the data to make better decisions & finding the final result.
Technical Prerequisite:
Machine Learning : to understand data science ,one needs to understand the concept of machine learning data science use machine learning algorithms to solve various problems.
Mathematical Modeling : mathematical modeling is required to make fast mathematical calculations & predictions from the available data.
Statistics : basic understanding of statistics is required such as mean, median or standard deviation it is needed to extract knowledge & obtain better results from the data.
Computer Programming : for data science , knowledge of at least one programming language is required R, Python, spark are some required computer programming languages for data science.
Databases : the depth understanding of database such as SQL, is essential for data science to get the data & to work with data.
Tools For Data Science
following are some tools required for data science.
Data analysis tools : R, Python, Statistics, SAS , Jupyter , R Studio, MATLAB, Excel, RapidMiner.
Data Warehousing : ETL, SQL , Hadoop, Informatical Talend , AWS Redshift
Data Visualization Tools : R, Jupyter notebook, tableau
Machine Learning Tools : spark, mahout, Azure, ML Studio
Life cycle of Data Science
![](https://static.wixstatic.com/media/a2206e_3dfd754542a249ab95d57711088d0c43~mv2.png/v1/fill/w_604,h_504,al_c,q_85,enc_avif,quality_auto/a2206e_3dfd754542a249ab95d57711088d0c43~mv2.png)
Phase 1 Discovery : before you begin the project it is important to understand the various specifications requirements, priorities & required budget you must possess the ability to ask the right questions. here you assess if you have the required resources present in terms of people technology, time and data to support the project. In this phase, you also need to frame the business problem & formulate initial hypotheses to test.
Phase 2 Data Preparation : In this phase, you require analytical sandbox in which you can perform analytics for the entire duration of the project. you need to explore, preprocess & condition data prior to modeling, further, you will perform ETLT to get data into the sandbox. let have a look at the statistical analysis flow below.
You can use R for data cleaning, transformation & visualization. this will help you to spot the outliers & establish a relationship between the variables once you have cleaned & prepared the data, it’s time to do exploratory analytics on it.
Phase 3 Model Planning : here, you will determine the methods and techniques to draw the relationships between variables. these relationships will set the base for the algorithms which you will implement in the next phase. you will apply exploratory data analytics using various statistical formulas & visualization tools.
![](https://static.wixstatic.com/media/a2206e_56caad0e874249b1a378bdae79a31496~mv2.jpg/v1/fill/w_244,h_113,al_c,q_80,enc_avif,quality_auto/a2206e_56caad0e874249b1a378bdae79a31496~mv2.jpg)
Phase 4 Model Building : In this phase, you will develop datasets for training and testing purposes. you will consider whether your existing tools will suffice for running the models or it will need a more robust environment. you will analyze various learning techniques like classification, association & clustering to build the model.
Phase 5 Operationalize : In this phase, you deliver find reports, briefings, code and technical documents. in addition, sometimes a pilot project is also implemented in a real – time production environment. this will provide you a clear picture of the performance and other related constraints on a small scale before full deployment.
Phase 6 Communicate Results : Now it is important to evaluate if you have been able to achieve your goal that you had planned in the first phase, so in the last phase, you identify all the key findings, communication to the stakeholders & determine if the results of the project are a success or a failure based on the criteria developed in phase 1.
Data Science Jobs Roles :
Most prominent Data Scientist Job titles are :
Data Scientist
Data Engineer
Data Analyst
Statistician
Data Architect
Data Admin
Business Analyst
Data/Analytics Manger
Summary
Data science is the area of study which involves extracting insights from vast amounts of data by the use of various scientific methods, algorithms and processes.
statistics, visualization, deep learning, machine learning, are important data science concepts.
Data science process goes through Discovery, data Preparation, model planning, model building, operationalize, communicate results
Hope you’all enjoyed this tutorial. i will write up another tutorial on Python in the next few days. stay tuned!
Comentarios