Practical-1 |Practical-2 | Practical-3 | Practical-4 | Practical-5 | Practical-6 | Practical-7 | Practical-8 | Practical-9 | Practical-10 | Practical-11| Practical-12 |
Practical:-12
AIM:-Mini Project
In this blog, I am creating a mini project on “Detecting Fake News” using Machine learning.
What is Fake News?
A type of yellow journalism, fake news encapsulates pieces of news that may be hoaxes and is generally spread through social media and other online media. This is often done to further or impose certain ideas and is often achieved with political agendas. Such news items may contain false and/or exaggerated claims and may end up being virtualized by algorithms, and users may end up in a filter bubble.
Let`s Start the implementation,
fromFirst, we require a dataset of fake news. So you can download the dataset form HERE.
After that, we import the python library. Blow image you can see the code.
Then I am going to load the dataset using pandas. Below the image, you can see the shape of the data set is 4 columns and 6335 rows.
After that, I get the shape of the data and the first 5 records.
Also, I am printing the information about the datasets using into().
And get the labels from the DataFrame.
Here I am coveting labels into numeric values using the labelencoder technique. Below image, you can see that things.
After that, I am Split the dataset into training and testing sets.
Let’s initialize a TfidfVectorizer with stop words from the English language and a maximum document frequency of 0.7 (terms with a higher document frequency will be discarded). Stop words are the most common words in a language that is to be filtered out before processing the natural language data. And a TfidfVectorizer turns a collection of raw documents into a matrix of TF-IDF features.
Now, fit and transform the vectorizer on the train set, and transform the vectorizer on the test set.
Model Fit:-
Here I am using LogisticRegression to predict the output.
Then, we’ll predict on the test set from the TfidfVectorizer and calculate the accuracy with accuracy_score() from sklearn metrics.
Also Here I am getting an accuracy score of 91.71%.
After that, I am going to calculate the confusion matrix.
Conclusion:-
I hope you will understand these things…
Github link:-
https://github.com/YagnikBavishi/Data-Science/tree/main/PR12