6. AI AND MACHINE LEARNING VTU LAB | READ NOW
MACHINE LEARNING VTU LAB – Naive Bayesian Classifier(using API)
Program 6. ASSUMING A SET OF DOCUMENTS THAT NEED TO BE CLASSIFIED, USE THE NAÏVE BAYESIAN CLASSIFIER MODEL TO PERFORM THIS TASK. BUILT-IN JAVA CLASSES/API CAN BE USED TO WRITE THE PROGRAM. CALCULATE THE ACCURACY, PRECISION, AND RECALL FOR YOUR DATA SET.
Program Code – lab6.py
import pandas as pd msg = pd.read_csv('document.csv', names=['message', 'label']) print("Total Instances of Dataset: ", msg.shape[0]) msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0}) X = msg.message y = msg.labelnum from sklearn.model_selection import train_test_split Xtrain, Xtest, ytrain, ytest = train_test_split(X, y) from sklearn.feature_extraction.text import CountVectorizer count_v = CountVectorizer() Xtrain_dm = count_v.fit_transform(Xtrain) Xtest_dm = count_v.transform(Xtest) df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names()) print(df[0:5]) from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(Xtrain_dm, ytrain) pred = clf.predict(Xtest_dm) for doc, p in zip(Xtrain, pred): p = 'pos' if p == 1 else 'neg' print("%s -> %s" % (doc, p)) from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score print('Accuracy Metrics: \n') print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred)) print('Precision: ', precision_score(ytest, pred)) print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
MACHINE LEARNING Program Execution – lab6.ipynb
Jupyter Notebook program execution.
import pandas as pd msg = pd.read_csv('document.csv', names=['message', 'label']) print("Total Instances of Dataset: ", msg.shape[0]) msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
Total Instances of Dataset: 18
X = msg.message y = msg.labelnum from sklearn.model_selection import train_test_split Xtrain, Xtest, ytrain, ytest = train_test_split(X, y) from sklearn.feature_extraction.text import CountVectorizer count_v = CountVectorizer() Xtrain_dm = count_v.fit_transform(Xtrain) Xtest_dm = count_v.transform(Xtest)
df = pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_names()) print(df[0:5])
about am an and awesome bad beers best boss can … tired to \
0 0 1 0 1 0 0 0 0 0 0 … 1 0
1 0 0 0 0 0 0 0 0 0 0 … 0 0
2 0 0 0 0 0 0 0 0 0 0 … 0 0
3 0 0 0 0 0 0 0 0 0 1 … 0 0
4 0 0 0 0 0 0 0 0 0 0 … 0 0
today tomorrow very we went will with work
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 1 0
4 0 0 0 0 0 0 0 0
[5 rows x 49 columns]
from sklearn.naive_bayes import MultinomialNB clf = MultinomialNB() clf.fit(Xtrain_dm, ytrain) pred = clf.predict(Xtest_dm)
for doc, p in zip(Xtrain, pred): p = 'pos' if p == 1 else 'neg' print("%s -> %s" % (doc, p))
I am sick and tired of this place -> pos
I do not like the taste of this juice -> neg
I love this sandwich -> neg
I can’t deal with this -> pos
I do not like this restaurant -> neg
from sklearn.metrics import accuracy_score, confusion_matrix, precision_score, recall_score print('Accuracy Metrics: \n') print('Accuracy: ', accuracy_score(ytest, pred)) print('Recall: ', recall_score(ytest, pred)) print('Precision: ', precision_score(ytest, pred)) print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
Accuracy Metrics:
Accuracy: 0.6
Recall: 0.5
Precision: 1.0
Confusion Matrix:
[[1 0]
[2 2]]
Alternative- alt lab6.ipynb
from sklearn.datasets import fetch_20newsgroups from sklearn.metrics import confusion_matrix from sklearn.metrics import classification_report import numpy as np
categories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med'] twenty_train = fetch_20newsgroups(subset='train',categories=categories,shuffle=True) twenty_test = fetch_20newsgroups(subset='test',categories=categories,shuffle=True)
print(len(twenty_train.data)) print(len(twenty_test.data)) print(twenty_train.target_names) print("\n".join(twenty_train.data[0].split("\n"))) print(twenty_train.target[0])
2257
1502
[‘alt.atheism’, ‘comp.graphics’, ‘sci.med’, ‘soc.religion.christian’]
From: sd345@city.ac.uk (Michael Collier)
Subject: Converting images to HP LaserJet III?
Nntp-Posting-Host: hampton
Organization: The City University
Lines: 14
Does anyone know of a good way (standard PC application/PD utility) to
convert tif/img/tga files into LaserJet III format. We would also like to
do the same, converting to HPGL (HP plotter) files.
Please email any response.
Is this the correct group?
Thanks in advance. Michael.
Michael Collier (Programmer) The Computer Unit,
Email: M.P.Collier@uk.ac.city The City University,
Tel: 071 477-8000 x3769 London,
Fax: 071 477-8565 EC1V 0HB.
1
from sklearn.feature_extraction.text import CountVectorizer count_vect = CountVectorizer() X_train_tf = count_vect.fit_transform(twenty_train.data)
from sklearn.feature_extraction.text import TfidfTransformer tfidf_transformer = TfidfTransformer() X_train_tfidf = tfidf_transformer.fit_transform(X_train_tf) X_train_tfidf.shape
(2257, 35788)
from sklearn.naive_bayes import MultinomialNB from sklearn.metrics import accuracy_score from sklearn import metrics mod = MultinomialNB() mod.fit(X_train_tfidf, twenty_train.target) X_test_tf = count_vect.transform(twenty_test.data) X_test_tfidf = tfidf_transformer.transform(X_test_tf) predicted = mod.predict(X_test_tfidf)
print("Accuracy:", accuracy_score(twenty_test.target, predicted)) print(classification_report(twenty_test.target,predicted,target_names=twenty_test.target_names)) print("confusion matrix is \n",metrics.confusion_matrix(twenty_test.target, predicted))
Accuracy: 0.8348868175765646
precision recall f1-score support
precision recall f1-score support alt.atheism 0.97 0.60 0.74 319 comp.graphics 0.96 0.89 0.92 389 sci.med 0.97 0.81 0.88 396 soc.religion.christian 0.65 0.99 0.78 398 avg / total 0.88 0.83 0.84 1502
confusion matrix is
[[192 2 6 119]
[ 2 347 4 36]
[ 2 11 322 61]
[ 2 2 1 393]]
Download Dataset