What is data science? — The only Data science course & Tutorial you need

--

Start learning data science with me in plain english.

What is Data science and what’s the fuss all about?

Data science is a multidisciplinary area that utilizes processes, algorithms, scientific methods, systems to obtain information and awareness from data in multiple forms, both structured and unstructured, similar to data mining.

Data Science is mainly used to make decisions and predictions

I would like to give the brief description of some in the above figure

What is Machine Learning?

It is a multidisciplinary arena centered on performing algorithms, expert of drawing, Predictive analytics from dynamic or static data sources using probabilistic models and applying distillation via training and feedback.

It offers practice of statistical data modeling, artificial intelligence learning methods, and pattern recognition.

What is artificial intelligence (AI)?

It is the impersonate of human brain processes by devices, particularly computer. These processes incorporate the knowledge (the recovery of data and rules for utilizing the data), logic (applying regulations to give rough or clear outcomes) and self-correction.

What is Deep Learning?

It is an AI approach that follows the operations of the human brain in processing data and generating designs for use in determination planning.

Big Data (BD): Big data is the frequently used term that defines the method of implementing severe computing strength — the modern in artificial intelligence and machine learning — to critically extensive and often highly complex sets of information.

KEY FEATURES OF DATA SCIENCE

It is a wide interdisciplinary field integrating elements of;

· Machine Learning and Artificial Intelligence

· Data Analysis, Data visualization, Data Mining

· Big Data, Deep Learning

· Statistics and probability

· Optimization, Applied Mathematics, Numerical Analysis,

· DevOps, Computer Science, and Programming, HPC

· Specific Domain Knowledge (Sciences, Business, Finance, etc.)

Why do we need DATA SCIENCE?

Previously we used to have lots of data created to perform some functions but without forecasting with the data reports.

Now with the Data science, we have an approach where the data reports give the picture of predictable, data analysis, actual status, Data storage like data warehousing, Data visualization, pattern discovery and decision making to perform some functions like sales strategies, loan recoveries.

Ø Recommended the right product to the Right customer to enhance business

Ø Predict the characteristics of high LTV customers and help in the customer segmentation

Ø Increase knowledge and skill in Machines

Ø Forecast false transactions previously

Ø Conduct sensibility investigation to predict the result

Data science applications across various industries

Business Intelligence Vs Data Science

Frame of Reference

We can only view at the past data, no predictions, no data patterns

We can predict with the data reports, patterns, visualization

Resource of Data

which are structured and sequelled like Data warehouse, usually SQL

Any kind of Data. Both Unstructured and Structured like cloud data, text, SQL, NoSQL

Data Strategy

Visualization and Statistics

Data Graph analysis, Machine Learning, Statistics,Neurolinguistic Programming

Focus

Earlier and Now

Instant and Future

Tools Used in Data Science

Data Science: Data Warehousing Data Visualisation Machine Learning

R Hadoop R Spark

Spark SQL Tableau Mahout

Python Hive Raw Azure ML Studio

SAS

How Artificial Intelligence, Machine Learning, Deep Learning, and Data Science are related?

Artificial Intelligence (AI):

· Intelligence presented by machines

· Broadly defined to include any simulation of human knowledge

· Expanding and branching areas of investigation, development, and investment

· Provides robotics, rule-based logic, common communication processing (NLP), information design representation techniques (knowledge graphs)

Machine Learning (ML)

· Is a Strategy to accomplish Artificial Intelligence

· Machine Learning is the subfield of Artificial Intelligence, which strives to develop computers the capacity to do tasks with data, without precise programming.

· Uses digital and arithmetical methods, including unnatural neural channels to encode learning in models.

· Models created using “training” calculation runs.

Deep Learning (DL)

· Is the Procedure for Implementing Machine Learning.

· Deep Learning is the subfield of ML that utilizes specific procedures including multi-layer (2+) unnatural neural networks

· Layering enables cascaded learning and notion levels (e.g. line -> shape -> object -> scene)

· Computationally intense approved by GPUs, clouds, and practiced HW such as TPUs, FPGAs, etc.

Data Science (DS)

· Is the union of algorithms, Scientific methods, and systems to obtain knowledge or perception from big data.

· Also known as Predictive or Advanced Analytics

· For handling large data sets, Algorithmic and computational techniques are the tools

· More focused on developing and modeling data for ML & DL tasks

· Comprises data manipulation, statistical methods, and streaming technologies (e.g., Spark, Hadoop)

· Crucial skill and tools behind building modern AI technologies

How mathematics and statistics are useful for data scientists?

Every famous scientific process was introduced by a corresponding discovery in Mathematics; we can look up the history of Einstein’s theory of physics, and Murkowski’s identified space-time model.

The selection process of data scientists will give higher preference to candidates with strong experience in statistics and mathematics. Data science is a completely developed version of statistics and mathematics, combined with programming and business logic.

For developing Data Science Mathematics and Statistics are the planting steps for every organization. If you wish to exceed in data science, you must have a good knowledge of basic algebra and statistics.

Learning maths, for people not having a background in mathematics, can be scaring. It’s very easy when you are clear with, what to study and what not. The list can include Linear Algebra, calculus, probability, statistics, discrete mathematics, regression, optimization etc.

Now, you will get questioned that, do I have a PhD in maths for being a Data Scientist?

No, you no need get any PhD to work as Data Scientist. Just you have to be clear with the following topics in mathematics and statistics.

Mathematics:

Mathematics is specialized language which deals with form, size and quantity. It is the abstract science of number, quantity, and space. In business mathematics it covers,

• Linear and Matrix Algebra,

• Probability,

  • Calculus.

Linear Algebra

Machine learning concepts are attached to linear algebra. We can say that PCA requires regression matrix multiplication.

And most ML administrations deal with high dimensional data which with many variables. This type of information is best described by matrices.

Matrix Algebra: It is a branch of mathematics concerning the study of structure, relation, and quantity. In algebra we have so many methods using for businesses, here we are going to discuss matrix algebra.

Probability: The concept of probability theory is the backbone of many important concepts in data science like inferential statistics to Bayesian networks. As we know that the journey of mastering statistics works with probability.

For business, we use binomial probability, which can be used to count probability of count.

Calculus: Calculus is related to several key ML applications. As you need to be able to calculate derivatives and angles for optimization. In fact, one of the most common optimization techniques is gradient descent.

We can get best resources by learning calculus for data science.

Statistics:

The systematic and scientific method of quantitative measurement is specifically known as statistics.

Statistics is involved with the collection, organization, presentation, and analysis of data which are measured in numerical expressions.

What is Statistical analysis?

Statistical analysis is a component of data analytics. In business intelligence, this analysis includes collecting and examining every data sample in a set of items from which samples can be extracted. A sample, in statistics, is a representative selection drawn from a total group.

Statistical analysis can be done in few simple steps, as below:

• Define the nature of the data to be analysed.

• Explore the relation of the data to the underlying population.

• Create a pattern to review of how the data relates to the underlying population.

• Prove the validity of the pattern.

• Employ predictive analytics to run situations that will help to control future actions.

Statistics can be used in Accounting, Finance, Economics and etc.

This is the little information about how to become a Data Scientist which may be useful.

If you liked this blog, hit the clap button.

--

--

Senior Data scientist at IBM, Kristian Sawin
Senior Data scientist at IBM, Kristian Sawin

Written by Senior Data scientist at IBM, Kristian Sawin

So here I'm, a tech-savvy woman, navigating my way through this world, carving my niche.