Definition of machine learning:
A computer program is said to learn from experience E with respect to some task T and some performance measures p, if its performance on T, as measured by p improvises with experience E.
In chess problem called Checker’s programming
Machine Learning Questions
1. What is experience E?
a) Playing checkers
b) probability that wins the next game of checker’s against some new opponent
c) Having the program play tens of thousands of game itself
d) Having experience of checker’s programming
Explanation: Here we must go with the experience of playing the game, in this program get experience from the game played.
Playing checkers is a Task and option (b) is the probability of winning, so task T is an option (a) Experience is an option (C) and probability is an option (d)
- Suppose you have a large inventory of identical items. You want to predict how many of these items will sell over the next 2 month, this problem can be categorized into
a) Supervised Learning
b) Unsupervised Learning
e) Both a and C
Here we have known input means we know how the input(Nature of Input) is and also output we required(Which Type of Output we want) so this can be classified into Supervised Learning
Supervised Learning: In this we are given a data set and already know what our correct output should look like, having the idea that there is a relationship between input and output.
But again we know supervised learning is of two types Regression and Classification, Here regression deals with continuous value means that map with a continuous function. In this example, we have a continuous value of 1000 identical item so it is a kind of regression problem.
- The cost function in supervised learning is used/measured?
a) Minimize the total cost
b) Maximize the strength of supervised learning
c) To measure the accuracy of the hypothesis
d) To measure training set value accurately
As we know cost function is used to measure the accuracy of the hypothesis, a cost function is a measure of how wrong the model is in terms of its ability to estimate the relationship between X and y. Lets us understand by one example let an adult person purchased a mobile, The first few days he tries to work with it but due to screen touch he/she makes mistake by typing. After sometimes he starts typing frequently but not able to operate Facebook, but after some time he/she starts working frequently but again stacked with Instagram because it is a little tough. So this Android Mobile works like cost function. it helps the learner to correct/change behavior to minimize mistakes.
- Gradient Descent is used for?
a)To minimize the cost function
b) To measure the accuracy of the hypothesis
c) To minimize some arbitrary function
d) To find the local minima
According to the definition of Gradient descent is used to minimize some arbitrary function j, here cost function is also a type of arbitrary function.
- Gradient descent is
Here Alpha (α) stands for? Who do we require to calculate derivative of θj
a) Here α is the Learning rate and the derivative is calculated for finding the maximum value of cost function.
b) Here α is calculated as cost function coefficient and the derivative is calculated for finding the local minima
c) Here α is Learning rate and the derivative is calculated to move the coefficient value in order to know the lowest cost
d) Here α Learning rate of hypothesis and derivative is calculated for which direction is downfall.
Here we are finding the derivative of cost function means slope, we need to know the slope so that we know the direction to move the lowest cost. Now that we know from the derivative which direction is the downfall, we can update the co-efficient value as
- In Linear regression Normal equation is best when?
a) No need to choose α
b) don’t need to iterate
c) Number of features is less
d) Need to choose α
If the number of feature N is less then better to go with normalization equation we don’t need to calculate α and iterate like a gradient.
- It is worst to use linear regression in classification due to?
a) Value is continuous
b) Value is a discrete type
c) Value is the nature of discontinuous nature
d) None of the above
Classification is not linear function that’s why it worst to use linear regression. In Regression, we have continuous value but in classification, we have discrete value.
- Why do we not use the same cost function (Linear Regression) for Logistic regression?
a)Having only one minima
b) Having many local maxima
c) Having many local minima
d) It is not continuous in nature
We use the sigmoid function because if we use the same cost function this function will create many minima so its too hard to calculate global minima.
- A medical diagnosis that will predict you are ill, cold, flu, cough, the viral flue is a type of
d) Multiclass Classification
we have to classify among the class which is ill, flue, cough, etc so it is a type of multiclass classification.
- If a neural network has s(j)=2 unit and s(j+1)=4 then what is the dimension theta(j) is?
If a network has s(j) unit in layer j, s(j+1) unit in layer j+1,then theta(j) will be of dimension(s(j+1),(s(j)+1)) so the answer is 4*3
you can also read
MUSING MACHINES & MANKIND: MAGIC OR MENACE?
WHY DO WE NOT USE THE SAME COST FUNCTION AS LINEAR REGRESSION IN THE CLASSIFICATION PROBLEM?
Because for classification hypothesis that is h(x^i) is 1/(1+e^(-px)) this will create a function like this which has many local minima so very difficult to find global minima. We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.
The curve is like this
Manish developed an interest in Machine Learning through a project he worked on (Alcohol Detection and Vehicle Ignition) during his MCA at the Cochin University of Science and Technology, India. His love of teaching and learning started much earlier. Manish has worked as a Youth wing in NCC and holds two certificates, one in NCC “B” and another in NCC “C”.