## 统计代写|数据科学、大数据和数据多样性代写Data Science, Big Data and Data Variety代考|Instrumentation and Interviewer Training

Finally, other studies exemplified the use of machine learning with a focus on instrumentation, interface development, and interviewer training. Arunachalam et al. (2015) showed how MCs and artificial neural networks (ANNs) can be used to improve computer-assisted telephone interviewing for the American Time Use Survey. Using algorithms that are particularly useful for temporal pattern recognition in combination with paradata, the authors’ goal was to predict a respondent’s next likely activity. This next likely activity would then be displayed live for interviewers on their computer assisted telephone interviewing (CATI)-screens based on time of day and previous activity to facilitate probing and data entry reducing item nonresponse. This process should ultimately improve data quality and increase data collection efficiencies. Although both algorithms predicted the respondents’ activity sequence accurately, the authors found a higher predictive accuracy for the ANNs. Machine learning has also been used to improve the survey instrument for open-ended questions, such as questions regarding occupation (see Section 1.6.1). To facilitate respondent retrieval, decrease respondent burden, and reduce coding errors, Schierholz et al. (2018) investigated computer-assisted coding. More specifically, they assessed the performance of matching algorithms in combination with gradient-boosting decision trees, suggesting a potential occupation based on a verbatim response initially provided by the respondent. Respondents then selected their occupation authors showed that the algorithm detected possible categories for $90 \%$ of all respondents, of which $80 \%$ selected a job title and 15\% selected “different occupation” thereby significantly reducing the resources needed for postinterview coding. Other applications of machine learning algorithms, such as regularization networks, test-time feature acquisition, or natural language processing can be used to reduce respondent burden and data collection cost by informing adaptive questionnaire designs (e.g. for nonresponse conversion) in which individuals receive a tailored number or order of questions or question modules, tailored instructions, or particular interventions in real time, depending on responses to earlier questions and paradata (e.g. for surveys more generally, Early 2017; Morrison et al. 2017; Kelly and Doriot 2017; for vignette surveys or conjoint analysis in marketing, Abernethy et al. 2007; for intelligent, dialogue-based or conversational, tutoring systems, or knowledge assessments, Niraula and Rus 2014).

## 统计代写|数据科学、大数据和数据多样性代写Data Science, Big Data and Data Variety代考|Alternative Data Sources

Other areas of application of MLMs for questionnaire design include the collection or extraction and processing of data from alternative (Big) Data sources. For example, collecting data from images (e.g. expenditure data from grocery or medical receipts, Jäckle et al. 2019), or websites and apps such as Flickr, Facebook, Instagram or Snapchat (Agarwal et al. 2011), or sensors (e.g. smartphone sensors capturing geo-location and app use, fitness trackers, or eye trackers) may allow researchers to simplify the survey questionnaire and reduce the data collection burden for respondents by dropping some questions entirely. Processing these Big Data, is however, often impossible with standard techniques and requires the use of MLMs to extract features. Among these are deep learning for image processing (e.g. student transcripts, ${ }^3$ photos of meals or receipts to keep food logs in surveys about food or health, ${ }^4$ or aerial images to assess neighborhood safety) (Krizhevsky, Sutskever, and Hinton 2012); natural language processing (e.g. to understand spoken meal descriptions (Korpusik et al. 2016); code student transcripts (Shorey et al. 2018)), or the use of Naïve Bayesian classifiers or density-based spatial clustering algorithms (e.g. applied to high-dimensional sensor data from smartphones to optimize the content, frequency, and timing of intervention notifications (e.g. Morrison et al. 2017), to detect home location (e.g. Vanhoof et al. 2018), or to investigate the relationship between location data and an individual’s behavior such as exercise frequency (e.g. Eckman et al. 2019)).

