EECE 5642 Data Visualization

 

Midterm Project Instructor: XXXXX

Submission Due Date: 11:59 pm Oct. 20 Progress Review Date: Oct. 8 Submission: Blackboard

Qustions

All the details and backgrounds for our midterm project could be found in the “Lecture- Midterm Project” slide. We list the requirements for this project as follows.

  • Preprocess 20 Newsgroup dataset as corpus and visualize its statistical information.(10’)
  • Buildtwo different vocabularies upon different preprocessing ways; Learn Bag-of-words (BoW) and TF-IDF model with each vocabulary  (10’)
  • Train two LDA models upon the vocabularies in Step 2; Visualize topics with four different methods; and eventually get the topic distribution (as feature) for each document.(20’)
  • Train two Doc2Vec models upon the vocabularies in Step 2; Visualize your learned word and document embedding space; Collect Doc2Vec representation of each document.(20’)
  • Conduct document clustering by K-means with four different doc. representations: 1) BoW; 2) TF-IDF;3) Topics distribution; and 4)  Compare different results by Normalized Mutual Information (NMI) and visualize the clustering results. (20’)
  • Do experiment analysis from the following aspects: 1) Impact of different preprocessing ways (g.,how to filter vocabulary; using n-gram model); 2) Impact of different topic numbers; and
  • 3) Different training methods for Doc2Vec; 4) What’s the key factor for doc. visualization? (20’)
  • Learn document representation beyond the above ones. For example, how to use temporal context in a document?(Bonus)

Every group is required to give a progress review presentation with slides in our class. The presentation time is about 5-10 minutes. Each talk should include the following contents.

  • Introduction to your group members and team
  • A clear illustration for your project
  • All the experimental results you have obtained by the presentation
  • A live demo for your visualization result (Jupyter Notebook isrecommended).
Data visualization代写

The final submission is required as follows.

  • Atwo-page pdf report including 1) a brief introduction to the project and your method; 2) all the necessary results and analyses; 3) references for the tools and papers you used in this work (the references could be put into an extra page, which contains nothing but references).
  • A package file including all your source codes and visualization

 

Hint: We do not require the format (e.g., single-column or double-column, font size and line space) for the final report. However, you need to make sure it is neat and readable. Some good and highly recommended (Word, Latex) templates could be found from IEEE Transactions or ACM conference.

联系客服提交作业获取报价与时间?

最快2~12小时即可完成,用技术和耐心帮助客户高效高质量完成作业.