Naive Bayes is the prominent Machine Learning Supervised Technique that involves the procedure of classification. It primarily employs the mathematical core topic namely the Bayes rules. One needs to have a prior information regarding the same concept.

Table of Contents

## What is Naive Bayes?

- The Naive Bayes approach is a classification algorithm for addressing categorization issues that is dependent on the Bayes rule or theorem.
- It is mostly employed in text categorization that require significant training database.
- The Naive Bayes Classifier is a straight forward yet powerful classification methodology that aids in the development of highly productive ml algorithms capable of creating rapid forecasts.
- It’s a probability relied classifier, which indicates it makes projections based on an entity’s probability.
- Spam filtering, sentiment classification, and articles categorization are all typical applications of the Naive Bayes Algorithm.

## Why this algorithm is typically termed as Naive Bayes?

The two terms Naive and Bayes compensate the Naive Bayes procedure, which can be represented as:

- It’s named “Naive” because it implies that the emergence of one feature is unrelated to the emergence of those other features.
- If the colour, size, and flavour of the fruits have been employed to distinguish it, a red, round, and sugary fruit is recognized as an apple.
- As a sense, each aspect assists to determining that it is an apple without relying on others.
- It’s termed “Bayes” since it’s predicated on the Bayes’ Probability Theorem which is a core mathematical concept.

## Bayes Probabilistic Theorem

- Its a core mathematical topic which is also termed as the “Bayes’ Law”.
- This theorem is primarily employed to know the possibility of a certain hypothesis in which the prior information is mandatory.
- It can be represented in mathematical terms as:

P(A/B)=[P(B/A)P(A)]/P(B)

wherein;

P(A/B) is the posterior possibility and P(B/A) is the likelihood possibility

P(A) is the prior possibility and P(B) is the marginal possibility

## Terms in Naive Bayes

- Posterior Possibility: It is the probability of a certain hypothesis A on a specific known event B.
- Likelihood Possibility: It is the probability of the given proof that the possibility of a certain hypothesis is true.
- Prior Possibility: It is the possibility of a certain hypothesis prior knowing the evidence.
- Marginal Possibility: It is the possibility of a specific evidence.

## How this algorithm operates?

- Assume we have such a database of meteorological conditions and a goal parameter called “Play.”
- So, employing this information, we must determine to choose whether or not play on a particular day depending on the climatic circumstances.
- To resolve this error, we must take the appropriate steps:
- Create frequency distribution tables from the given data.
- Find the probabilities of provided unique features to produce a Likelihood chart.
- Compute the posterior probability employing Bayes’ theory.

## Sorts of Naive Bayes system

There exists in total of 3 sorts of models in Naive Bayes:

**Gaussian system:**This model implies that variables are dispersed in a normal manner. If predictors take continuous quantities instead of discrete data, the model thinks that all these results are drawn from a Gaussian kernel.**Multinomial system:**When the information is multinomial dispersed, the Multinomial Nave Bayes method is utilized. It is largely employed to solve text categorization issues, which includes defining which genre a document corresponds to, like Sport, Government, or Education. The determinants in the learner are based on the likelihood of terms.**Bernoulli system:**Like the Multinomial learner, the Bernoulli method employs isolated Booleans values as predictors. For illustration, detecting whether or not a particular word word appears in the document. This paradigm is also very well for reputed in jobs involving files classification.

## Tricks to improve the efficiency of this model

- If continual characteristics are not normally distributed, conversion or other procedures should be employed to transform them to one.
- If the test database set has a zero or no frequency concern, use “Laplace Corrections” softening techniques to predict the test database’s class.
- Delete linked characteristics since they are actually voted twice in the algorithm, which might contribute to overestimation of significance.

## Merits

- Easiest classification models and faster compared to the others
- Employed for both the binary and the multi group classifications
- Executes beyond the imagination that is best for the multi grouped categorizations amongst all
- Reputed and efficient in the character categorization issues

## Demerits

- It is unable to understand the correlation amongst various features as this model thinks that all the characteristics are independent from every other.