Empirical: The Link Between Survey Response Rates and Nonresponse Bias: Theory, Simulations, and Empirical Evidence From the Household Pulse Survey
This was inspired by the work I had done at Brookings, where we found out about how the Household Pulse Survey is a really terrible survey. I was thinking about what the empirical relationship I see in the data between response rates and depression would be if there is nonresponse bias, and realized that the answer is not immediately obvious. So I put on my theorist hat, drew upon my Ec 1011a knowledge, and wrote down a model for response rates and nonresponse bias, which eventually developed into this paper.
Theory: Axioms and Theorems in Voting Theory with a Brief Biography of Kenneth May.
I wrote this for the final paper of ECON 1080: Great Theorems of Economics taught by Professor Jerry Green. The class is about 2/3 microeconomic theory topics and 1/3 history of microeconomic theory. I also did a presentation on the subject and the slides can be found here.
Dobson, Emily, Carol Graham, Tim Hua, and Sergio Pinto. 2022. “Despair and Resilience in the US: Did the COVID Pandemic Worsen Mental Health Outcomes?” Working Paper 171. Brookings Global Working Paper Series. Brookings Institution. Brookings WP link
Hua, Tian., Kim, Chris Chankyo, Zhang, Zihan., & Lyford, Alex. 2021. "COVID-19 Tweets of Governors and Health Experts: Deaths, Masks, and the Economy"
Journal of Student Research 10 (1). https://doi.org/10.47611/jsr.v10i1.1171
Powerpoint Slide for spring Symposium
Preliminary update: I wrote this paper before I had taken econometrics, probability, or statistics. I wrote it before I really learned R. (Wild eh?) So the data analysis we conducted was somewhat limited. This is not necessarily a huge deal: we had a census of all tweets from the time period (i.e., we had the population of data), so none of the findings are "wrong." However, in retrospect, there was a lot that I would have done differently. The biggest one among them being that I would have treated each individual user as a unit of observation, as opposed to pooling everything together by user category.
Heck, if I could figure out how it works, I might even incorporate Abadie et al (2020)'s work. I'm going to be doing an independent study in grad econometrics in the spring and hopefully finally understand this paper.
For now though, I present the following two randomization inference graphs on the rate at which Republican/Democrat governors mention death or masks in their COVID-19 related tweets. The graph on the left looks at the proportion of COVID-19 related tweets that contained words relating to death; the one on the right looks at the proportion of COVID-19 related tweets that contained the word "mask." We randomize the assignment of party labels. The p-values are 0.001667 and 0.013 respectively, suggesting that the difference between Democrat and Republican governors are, in one sense, statistically significant (i.e., Democrats mention deaths and masks more). For reference, a similar test with words relating to the economy yields a p-value of 0.883.
These project descriptions are written pretty casually (since the research itself is pretty casual).
Do Last Names Affect Solo-authoriship Rates in Economics?
When I heard from my professor that econ papers almost always list authors based on the alphabetical order of their last name, I thought that those with last names that start later in the alphabet might decide to solo-author more, so that they won't always be lost in the et al's.
I pulled data from Web of Science for every article published in the top five in the past ten years (n = 4680) and looked at the distribution of last names for co and solo-authored papers. Turns out the difference in the mean last name (A = 1, B = 2, etc.) of authors involved in solo-authored v. coauthored is not that big: 11.57337 for solo authors and 11.36265 for co-authored. These numbers are calculated when I pooled all of the last names in all solo/co-authored papers together, and since this is a census of papers there are no confidence intervals to add. Here are the histograms for those last name pools:
The problem is that alphabetical discrimination literature has known this problem for a while... I think this guy might have done something on the topic except their paper/methods is better, or there's also this paper or this paper. It was a fun R exercise though. I followed Varian's advice that, especially when I still needed practice doing research, to not do a literature review first so I get to get my hands a bit dirty on a topic I find interesting and get some experience.
A Productivity-Centered Theory of Time Allocation
Existing models of time allocation focuses on a work-leisure trade-off subject to money and time constraints, where individuals need to consider both money and time when deciding between leisure, work, and buying/making at home. I feel like this really doesn't apply to a lot of people. This is especially true given that nowadays, leisure (Disney+, video games, exercise, online chess) has almost no marginal cost (in terms of money). Also, we know that lots of people find fulfillment in their work or other activities that aren't usually considered leisure such as activism, art, or freelance work.
Thus, in my model, individuals maximize the amount of fulfilling work (like economics research lol) they get done (I guess you could also call this utility), subject to time and tiredness constraints. Tiredness here refers to the phenomenon where I get tired after working for a while, and my productivity decreases. To maximize the total amount of work I get done I need to rest. This can be applied to traditional leisure activities as well: I can't go snowboarding all day long because I'll get tired. So what I have in mind is more akin to the Solow model, where the population (agent) allocates money (time) between consumption (producing fulfilling work now) or investment (sleeping so they will be more productive tomorrow). The key difference here is that there is no clear functional form for resting (whereas capital next period = capital now + investment - depreciation) and that there is an upper limit to how productive someone can get. I model that using an "efficiency percentage", where the total amount of work someone gets done over time is the integral of their efficiency percentage over the period of time that they are working, multiplied by their maximal marginal product of time.
However, this is where my thinking sort of ended and I don't know where to go...