Browning, David NicholsPublished April 15, 2020You can use IBM SPSS Statistics for various descriptive and predictive analyses of data, such as those generated by the COVID-19 pandemic. SPSS Statistics features an easy-to-use graphical user interface (GUI), but almost all of what the GUI allows you to do is performed behind the scenes by a powerful command syntax language.
![]()
The command syntax provides flexible programming of analyses, the ability to save instructions for the reproduction of future outcomes, and allows for adaptation to new data or problems. For information regarding SPSS Statistics command syntax, see.The data for this example is from the file that was obtained from the highly informative website.
This site contains a substantial amount of useful information on data and other aspects of the COVID-19 pandemic. The data is maintained by Hannah Ritchie, and was downloaded directly from.The fulldata.csv file was downloaded on April 1, 2020, and contains COVID-19 data through the end of March (based on the timelines maintained by the European Centre for Disease Prevention and Control). Refer to the notes at for details regarding exact dates.
IBM SPSS Statistics 25 Latest Version for Mac OS X and Windows (32-64 bit) Direct Download Links at Softasm. The world’s leading statistical software used to solve business and research problems by means of ad-hoc analysis, hypothesis testing, and predictive analytics.
The data values presented here might not exactly match data from other sources (for various reasons). Loading the data file into SPSS StatisticsBecause the fulldata.csv data file is not a native SPSS Statistics file (.sav), the file must first be imported into the application.
Instructions for importing.csv files through the GUI can be found at.Command syntax can also be used to read the data into SPSS Statistics.Open a new Syntax Editor session in SPSS Statistics by selecting File New Syntax.Copy the following syntax into the Syntax Editor dialog box. Show more Show more icon.Change on the /FILE subcommand of the GET DATA command to reference the directory where the fulldata.csv file is located on your system.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).The file structure is fairly simple.
It contains a date variable, a location variable (which indicates the country or territory), and variables that provide daily counts of new recorded cases, newly recorded deaths, total recorded cases, and total recorded deaths. In addition to data for the entire world, there is data for 199 specific locations where at least one COVID-19 case has been reported.The file contains a total of 7996 rows (referred to as records or cases). There are 92 records for the world, and variable numbers for specific locations. Some of the locations did not have cases reported until well after the time period covered by the data (which begins December 31, 2019, for some locations). Converting the date variableBecause the date variable in the data file does not match any of the many defined SPSS Statistics date formats, you must edit the date variable information to convert it to a format that SPSS Statistics recognizes. The following commands replace dashes in the variable strings with slashes, change the variable type from string to date, assign the variable to an ordinal measurement level, add variable labels to the variables, and provide counts with the same format (0 decimals).Note: The ordinal measurement level for the dates is useful because it provides variable flexibility when charting (it maintains the sorting order, but marks it as categorical, which some chart functions require).Copy the following syntax into the Syntax Editor dialog box. Show more Show more icon.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).Analyzing COVID-19 data by locationSPSS Statistics offers multiple options for analyzing data separately for each location.
![]()
If you want to perform the same analysis on each location, you can use thecommand to submit data to a statistical procedure (one location at atime). The command can produce completely separate output, or results for locations stacked in output tables. To focus on a particular subset of data (such as for the world or the United States), you can filter out all other locations, or create a new data set that contains only the location of interest.The following commands create a data set that contains data solely for the United States.Copy the following syntax into the Syntax Editor dialog box. Show more Show more icon.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).Generating a histogram chartThe following commands generate a histogram chart of new COVID-19 cases by day. The ordinal categorical measurement level for the date variable is required to produce the histogram.Note: The following commands invoke the SPSS Statistics Chart Builder charting engine. For more information, see (for the GUI), or (for command syntax).Copy the following syntax into the Syntax Editor dialog box. GGRAPH/GRAPHDATASET NAME='graphdataset' VARIABLES=date newcases MISSING=LISTWISE REPORTMISSING=NO/GRAPHSPEC SOURCE=INLINE.BEGIN GPLSOURCE: s=userSource(id('graphdataset'))DATA: date=col(source(s), name('date'), unit.category)DATA: newcases=col(source(s), name('newcases'))GUIDE: axis(dim(1), label('Date'))GUIDE: axis(dim(2), label('New Cases'))GUIDE: text.title(label('Simple Histogram of New Cases by date'))SCALE: linear(dim(2), include(0))ELEMENT: interval(position(date.newcases), shape.interior(shape.square))END GPL.
GGRAPH/GRAPHDATASET NAME='graphdataset' VARIABLES=date newcases MISSING=LISTWISE REPORTMISSING=NO/GRAPHSPEC SOURCE=INLINE/FITLINE TOTAL=NO.BEGIN GPLSOURCE: s=userSource(id('graphdataset'))DATA: date=col(source(s), name('date'), unit.category)DATA: newcases=col(source(s), name('newcases'))GUIDE: axis(dim(1), label('Date'))GUIDE: axis(dim(2), label('New Cases'))GUIDE: text.title(label('Simple Scatter of New Cases by date'))SCALE: linear(dim(2), include(0))ELEMENT: point(position(date.newcases))END GPL. Show more Show more icon.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).
The resulting scatterplot chart displays in the SPSS Statistics Output Viewer.Note: Both of these graphs were generated by using the SPSS Statistics Chart Builder GUI. The GPL (Graphics Programming Language) used here is based on Leland Wilkinson’s Grammar of Graphics, and is extremely powerful and flexible. Advanced featuresIf you’re new to data modeling, SPSS Statistics provides options that make it easy to get started with more complex methods, such as time series modeling and nonlinear regression. Time Series ModelerThe command can be accessed through a, and includes an mode that attempts to find the best model for a given series. Expert Modeler also has an automatic outlier detection capability. The following example lets Expert Modeler select a model for the new cases series and predicts it 30 days beyond the existing data.Copy the following syntax into the Syntax Editor dialog box. Show more Show more icon.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).
The resulting Time Series Modeler tables and chart display in the SPSS Statistics Output Viewer.The rather complicated ARIMA (autoregressive integrated moving average) model with seven outliers fitted by the TSMODEL procedure’s Expert Modeler fits the observed data almost perfectly, but eventually begins to predict a straight line increase in new cases forever (which obviously cannot happen). The model is alsodifficult to interpret.In the early stages of an epidemic or pandemic, the growth in new cases is often well-modeled as an exponential function of time. Although there’s a date variable in the data set, dates in SPSS Statistics are expressed in seconds since the beginning of the Gregorian calendar, so it’s easier to work with a metric like days for interpreting most models.The following example computes a Day variable for each case in the data set using a useful $CASENUM system variable that indexes sequential cases.Copy the following syntax into the Syntax Editor dialog box. Show more Show more icon.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).Nonlinear regressionNext, we fit a nonlinear regression model using an exponential function of the number of days. This model type requires complete specification of the model’s functional form, including parameter naming and starting value provisioning.
The model we fit has two parameters, a b0 intercept, and a b1 growth parameter that is raised to the power of time in days.Copy the following syntax into the Syntax Editor dialog box. Show more Show more icon.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).This model also fits the data quite well, with an R^2 value above 0.96. The value of the b1 growth parameter is approximately 1.159. Taking the ratio of the logarithm of 2, divided by the logarithm of this value, gives us the predicted doubling time in days (which is approximately 4.7). According to this model, the results mean that on average, new cases are doubling approximately every 4.7 days.Let’s see how the forecasts from the two models compare. The command lists out the days, new cases, and predictions and forecasts from the two models.Copy the following syntax into the Syntax Editor dialog box.
Show more Show more icon.Highlight the previous syntax, and click the green Run Selection icon on the toolbar (you can also select Run Selection from the menu).SummaryBoth models forecast eternal growth in the number of new cases, but the forecasts from the exponential nonlinear regression model grow explosively, predicting over two million new cases per day by the end of April!A famous statistician has often been quoted or paraphrased as saying that all models are wrong, but some are useful. Neither of these models are correct, but the simpler exponential model is known by epidemiologists to provide a good approximation to the behavior of viral infections when left unchecked (that is, until so many people become infected that the virus runs out of new people to infect). This is why extreme, and for many of us, unprecedented, measures arebeing taken to battle the COVID-19 pandemic.Your challenge in the SPSS Statistics version of the IBM Call for Code is to use SPSS Statistics, along with any fully publicly available data, to build models and model visualizations that aid in the understanding of the course of the COVID-19 pandemic and the effects of our responses to the coronavirus.
![]() Comments are closed.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
March 2023
Categories |