See document attached
Problem 1
Part of the challenge of data mining text is that is that the sequence and context of words matters in communication. Consider the use of the word “good” in a movie review. Briefly explain how the word “good” could be used to convey both positive and negative feelings about a movie, why this highlights the importance of context, and if you believe there is a way to work around this problem.
Problem 2
This module provided an overview of a handful of other commonly used data mining techniques.
Consider a problem from your current or a past job, a hobby, or an interest that would make for a good application of one of the following techniques:
• Text-based data mining
• Co-occurrence grouping and associations
• Profiling
• Link prediction
Describe why this would be an appropriate example of a problem that can be solved with one of the methods above and what the use of the results of this analysis would be.
Please do not choose a hypothetical example like something from the textbook or an example from the slides, it should be something with which you have personal experience (yes, this problem is like problem 2 from problem set 2).
Problem 3
You have been hired by a hotel chain to take another crack at improving their booking and profitability. Armed with more data mining knowledge than ever before, you decide to once again create a classification decision tree model to predict cancelations, only this time you brought in the big guns
:
ensemble methods.
Target variable:
· is_canceled: whether the reservation was canceled
Attributes:
· hotel_type: whether the hotel is a “resort” or “city” hotel
· summer: whether the was made for the summer season or not
· children: whether children are listed on the reservation
· previous_cancelations: if person who made reservation has canceled before
We have 3 different tree induction models,
evaluate
each model on the test set.
:
1. A regular single decision tree
https://bigml.com/shared/evaluation/xTXf88MOhwF3cLqmAOBkqOTh9rA
2. An
ensemble
of trees using random forests (which BigML calls “decision forests”)
https://bigml.com/shared/evaluation/iDLqmKeWNuwr6kDGBK2XFM3ZarD
3. An
ensemble of trees using boosting (which BigML calls “boosted trees”)
https://bigml.com/shared/evaluation/uMi3GEWbLih6L5f1q1soFA08kiX
Finally, describe and compare the performance of each model and comment on if their relative performance met your expectations.
Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.
Read moreEach paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.
Read moreThanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.
Read moreYour email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.
Read moreBy sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.
Read moreOur specialists are always online to help you! We are available 24/7 via live chat, WhatsApp, and phone to answer questions, correct mistakes, or just address your academic fears.
See our T&Cs