Text Mining for Economics and Finance

Imperial College London

Homework 1

 

金融经济作业代写 Homework 1:Instructions: Please complete the following assignment in the groups to which you have been assigned.

Instructions:  金融经济作业代写

Please complete the following assignment in the groups to which you have been assigned. The end product should be a jupyter notebook with well-documented code and discussion of results. Make sure you include in your submitted notebook any auxiliary files needed to make the notebook run. Please place all material in a zip file and upload by 4pm on Friday 29 January.

金融经济作业代写

To begin the assignment, your group needs to select a textual database of your choice to analyze. You can use one that is in public circulation or, if you prefer, your own. Examples of the former are presidential state-of-the-union address http://bit.ly/30olaNe and minutes of the Bank  of  England’s  monetary  policy  meetings  http://bit.ly/36Tc39u. In any case you will need to share the data with Yi and me to verify the notebook runs, but we will not circulate further.

Your assignment is the following:

1.  金融经济作业代写

Pre-process the documents in your dataset following the steps discussed in lecture (at the very least, tokenizing, stemming, and stopword removal). Which are the most common words based on overall counts? And which words in the corpus get the highest tf-idf score?

2.

Identify a dictionary of interest to measure heterogeneity across documents. You can use an existing one, or invent one of your own. (It should not be difficult to  find dictionaries online; for example see http://www3.nd.edu/~mcdonald/Word_Lists.html ). Why have you selected the dictionary you did?

3.  金融经济作业代写

Use your dictionary to provide a quantitative representation of each address using  a simple count-based  measure.

4.

Find some external series of interest (e.g. for the state-of-the-union addresses whether the US is in recession, engaged in a major war, the average inflation rate, etc.) that you think might correlate with your quantitative representation, and compute the correlation. Is it the sign you expected? Is itsignificant?

5.  金融经济作业代写

Now use the same dictionary, but compute the content of each document usingterm weighting as discussed in class. Do your answers to the previous question change if you use this alternative representation?

金融经济作业代写

联系客服提交作业获取报价与时间?

最快2~12小时即可完成,用技术和耐心帮助客户高效高质量完成作业.