Machine Learning Basic


Machine Learning is a set of algorithms that train on a data set to make predictions or take action in order to optimize some system.

Like as supervised classification algorithms are used to classify potential clients into good or bad prospects, for lone purposes based on the historical data.

A Every beginner come in mind one question, Machine learning is very wide,” Which algorithm should I use? “The answer to the question varies depending on many factors
  • The size, quality, and nature of data.
  • The available computational time.
  • The urgency of the task.
  • What you want to do with the data.

Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms.


Supervised Learning:
Supervised learning algorithms make prediction based on a set example. For example, historical sales can be used to estimate the future prices. With supervised learning, you have an input variable that consists of labelled training data and desired output variable.
You use an algorithm to analyse the training data to learn the function that maps the input and output.

  • Classification: When the data are being used to predict a categorical variable, supervised learning is also called classification. This is the case when assigning a label or indicator, either dog or cat to an image.When there are only two labels, this is called binary classification. when  there are more then two categories, the problems are called multi-call classification.
     ·      Regression:When predicting continuous values, the problem become a     regression problem.
  •  Forecasting:This is the process of making predictions about the future base on the past and present data. It is most commonly used to analyse trends. A common example might be estimation of the next year sales based on the sales of the current year and previous years.

Introduction


Data Science is blend of various tools, algorithms, and machine learning principles with goal to discover hidden patterns from the raw data.In Data Science below points lies:   
   1)  Data Analysis
   2)  Machine Learning and Algorithms
   3)  Data Product Engineering

Data Analyst describe in past history of the data and other hand data Scientist not only exploratory analysis to discover inside from it but also apply varies machine learning algorithms and also it is a process of inspecting, cleansing, transforming, and modelling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

Machine learning is an algorithm or model that learns patterns in data and then predicts similar patterns in new data. For example, if you want to classify children’s books, it would mean that instead of setting up precise rules for what constitutes a children’s book, developers can feed the computer hundreds of examples of children’s books. The computer finds the patterns in these books and uses that pattern to identify future books in that category.

Data engineering is the process of designing and building systems that let people collect and analyze raw data from multiple sources and formats. These systems empower people to find practical applications of the data, which businesses can use to thrive.






Data Science Interview Questions -II

  1. Q 5: What is overfitting and underfitting?

Answer: Solving the issue of bias and variance is really about dealing with over-fitting and under-fitting. Bias is reduced, and variance is increased in relation to model complexity.
As more and more parameters are added to a model, the complexity of the model rises, and variance becomes our primary concern while bias steadily falls.
Underfitting: The model has not captured the underlying logic of the data, it clusmey and low accuracy.


Q 6 : What is Hypothesis?


Answer:
Hypothesis is a predictive statement, capable of being tested by scientific methods, that relates an independent variable to some dependent variable.
A hypothesis states what we are looking for and it is a proportion which can be put to a test to determine its validity. E.g. Student who receive counselling will show a greater increase in creativity then students not receiving counselling.

Characteristics of Hypothesis:
  • Clear and precise.
  •  Capable of being tested
  • Stated relationship between variables.
  • Limited in scope and must be specific.
  • Stated as for possible in must simple terms so that the same is easy understand by all concerned. But one must remember that simplicity of hypothesis has nothing to do with it’s significance.
  • Consistent with most known facts.
  • Responsive to testing with in a reasonable time. One can’t spend a life time collecting data to test it.
  • Explain what is claims to explain; it should have empirical reference.

Null Hypothesis:
-          It is an assertion that we hold as true unless we have sufficient statistical evidence to conclude otherwise.
-          Null hypothesis is donated by H0
-          If a population mean is equal to hypothesised mean or sample mean, then Hypothesis can be written as

Alternative Hypothesis:
-          The Alternative hypothesis is negative of full hypothesis and is denoted by Ha
-          The alternative Hypothesis can be written as 


Machine Learning Basic - II


Semi - Supervised learning:
The challenge with supervised learning is that labelling data can be expensive and time consuming, if labels are limited, you can use unlabelled examples to enhance supervised learning because the machine is not fully supervised in this case, we say the machine is semi-supervised. With semi-supervised, you use unlabeled example with small amount of labeled data to improve the learning accuracy.



Unsupervised learning:
When performing unsupervised learning, the machine is presented with totally unlabeled data. It is asked to discover the intrinsic patterns that underlies the data, such as clustering structure, a low dimensional manifols, or asparse tree and graph.
·   Clustering: Grouping a set of data examples so that example is one group (or one cluster) are more similar (according to some criteria) than those in other groups.
·   Dimension reduction: Reducing the number of variables under consideration. In many applications. The row data have very high dimensional feature and some feature are redundant or irrelevant to the task. Reducing the dimensionality helps to find the true, latent relationship.