## Linear Regression Exercises

### Linear Regression Exercises

**Linear Regression Exercises Due 10/13/17 by 10 pm**

**Simple Regression**

Research Question: Does the number of hours worked per week (*workweek*) predict family income (*income*)?

Using Polit2SetA data set, run a simple regression using Family Income (*income*) as the outcome variable (Y) and Number of Hours Worked per Week (*workweek*) as the independent variable (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetA data set.

2. Click on **Analyze**, then click on **Regression**, then **Linear**.

3. Move the dependent variable (*income*) in the box labeled “Dependent” by clicking the arrow button. The dependent variable is a continuous variable.

4. Move the independent variable (*workweek*) into the box labeled “Independent.”

5. Click on the **Statistics **button (right side of box) and click on **Descriptives**, **Estimates**, **Confidence Interval** (should be 95%), and **Model Fit**, then click on **Continue**.

6. Click on **OK**.

**Assignment:** Through analysis of the SPSS output, answer the following questions. Answer questions 1 – 10 individually, not in paragraph form

1. What is the total sample size?

2. What is the mean income and mean number of hours worked?

3. What is the correlation coefficient between the outcome and predictor variables? Is it significant? How would you describe the strength and direction of the relationship?

4. What it the value of R squared (coefficient of determination)? Interpret the value.

5. Interpret the standard error of the estimate? What information does this value provide to the researcher?

6. The model fit is determined by the ANOVA table results (*F* statistic = 37.226, 1,376 degrees of freedom, and the *p* value is .001). Based on these results, does the model fit the data? Briefly explain. (Hint: A significant finding indicates good model fit.)

7. Based on the coefficients, what is the value of the y-intercept (point at which the line of best fit crosses the y-axis)?

8. Based on the output, write out the regression equation for predicting family income.

9. Using the regression equation, what is the predicted monthly family income for women working 35 hours per week?

10. Using the regression equation, what is the predicted monthly family income for women working 20 hours per week?

**For this assignment, answer question 1 through 10 individually. DO NOT ANSWER IN PARAGRAPH FORM.**

**Multiple Regression**

**Assignment:** In this assignment we are trying to predict CES-D score (depression) in women. The research question is: How well do age, educational attainment, employment, abuse, and poor health predict depression?

Using Polit2SetC data set, run a multiple regression using CES-D Score (*cesd*) as the outcome variable (Y) and respondent’s age (*age*), educational attainment (*educatn*), currently employed (*worknow*), number, types of abuse (*nabuse*), and poor health (*poorhlth*) as the independent variables (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetC data set.

2. Click on **Analyze, **then click on **Regression**, then **Linear**.

3. Move the dependent variable, CES-D Score (*cesd*) into the box labeled “Dependent” by clicking on the arrow button. The dependent variable is a continuous variable.

4. Move the independent variables (*age*, *educatn*, *worknow*, and *poorhlth*) into the box labeled “Independent.” This is the first block of variables to be entered into the analysis (block 1 of 1). Click on the bottom (top right of independent box), marked “Next”; this will give you another box to enter the next block of indepdent variables (block 2 of 2). Here you are to enter (*nabuse*). **Note:** Be sure the Method box states “Enter”.

5. Click on the **Statistics** button (right side of box) and click on **Descriptives**, **Estimates**, **Confidence Interval** (should be 95%), **R square change**, and **Model Fit**, and then click on **Continue**.

6. Click on **OK**.

**Assignment:** (When answering all questions, use the data on the coefficients panel from Model 2). Answer questions 1 – 5 individually, not in paragraph form

1. Analyze the data from the SPSS output and write a paragraph summarizing the findings. (Use the example in the SPSS output file as a guide for your write-up.)

2. Which of the predictors were significant predictors in the model?

3. Which of the predictors was the most relevant predictor in the model?

4. Interpret the unstandardized coefficents for educational attainment and poor health.

5. If you wanted to predict a woman’s current CES-D score based on the analysis, what would the unstandardized regression equation be? Include unstandardized coefficients in the equation.

**For this assignment, answer question 1 through 5 individually. DO NOT ANSWER IN PARAGRAPH FORM.**

**Required Readings**

**Gray, J.R., Grove, S.K., & Sutherland, S. (2017)****. Burns and Grove’s the practice of nursing research: Appraisal, synthesis, and generation of evidence**** (8th ed.). St. Louis, MO: Saunders Elsevier**.

- Chapter 24, “Using Statistics to Predict”

This chapter asserts that predictive analyses are based on probability theory instead of decision theory. It also analyzes how variation plays a critical role in simple linear regression and multiple regression.

*Statistics and Data Analysis for Nursing Research*

- Chapter 9, “Correlation and Simple Regression” (pp. 208–222)

This section of Chapter 9 discusses the simple regression equation and outlines major components of regression, including errors of prediction, residuals, OLS regression, and ordinary least-square regression.

- Chapter 10, “Multiple Regression”

Chapter 10 focuses on multiple regression as a statistical procedure and explains multivariate statistics and their relationship to multiple regression concepts, equations, and tests.

- Chapter 12, “Logistic Regression”

This chapter provides an overview of logistic regression, which is a form of statistical analysis frequently used in nursing research.

**Optional Resources**

**Walden University. (n.d.). Linear regression. Retrieved August 1, 2011, from http://streaming.waldenu.edu/hdp/researchtutorials/educ8106_player/educ8106_linear_regression.html**

**Week 7 Linear Regression Exercises**

Simple Regression

Research Question: Does the number of hours worked per week (workweek) predict family income (income)?

Using Polit2SetA data set, run a simple regression using Family Income (income) as the outcome variable (Y) and Number of Hours Worked per Week (workweek) as the independent variable (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetA data set.

2. Click on Analyze, then click on Regression, then Linear.

3. Move the dependent variable (income) in the box labeled “Dependent” by clicking the arrow button. The dependent variable is a continuous variable.

4. Move the independent variable (workweek) into the box labeled “Independent.”

5. Click on the Statistics button (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), and Model Fit, then click on Continue.

6. Click on OK.

Assignment: Through analysis of the SPSS output, answer the following questions.

1. What is the total sample size?

2. What is the mean income and mean number of hours worked?

3. What is the correlation coefficient between the outcome and predictor variables? Is it significant? How would you describe the strength and direction of the relationship?

4. What it the value of R squared (coefficient of determination)? Interpret the value.

5. Interpret the standard error of the estimate? What information does this value provide to the researcher?

6. The model fit is determined by the ANOVA table results (F statistic = 37.226, 1,376 degrees of freedom, and the p value is .001). Based on these results, does the model fit the data? Briefly explain. (Hint: A significant finding indicates good model fit.)

7. Based on the coefficients, what is the value of the y-intercept (point at which the line of best fit crosses the y-axis)?

8. Based on the output, write out the regression equation for predicting family income.

9. Using the regression equation, what is the predicted monthly family income for women working 35 hours per week?

10. Using the regression equation, what is the predicted monthly family income for women working 20 hours per week?

For this assignment, answer question 1 through 10 individually. DO NOT ANSWER IN PARAGRAPH FORM.

Multiple Regression

Assignment: In this assignment we are trying to predict CES-D score (depression) in women. The research question is: How well do age, educational attainment, employment, abuse, and poor health predict depression?

Using Polit2SetC data set, run a multiple regression using CES-D Score (cesd) as the outcome variable (Y) and respondent’s age (age), educational attainment (educatn), currently employed (worknow), number, types of abuse (nabuse), and poor health (poorhlth) as the independent variables (X). When conducting any regression analysis, the dependent (outcome) variables is always (Y) and is placed on the y-axis, and the independent (predictor) variable is always (X) and is placed on the x-axis.

Follow these steps when using SPSS:

1. Open Polit2SetC data set.

2. Click on Analyze, then click on Regression, then Linear.

3. Move the dependent variable, CES-D Score (cesd) into the box labeled “Dependent” by clicking on the arrow button. The dependent variable is a continuous variable.

4. Move the independent variables (age, educatn, worknow, and poorhlth) into the box labeled “Independent.” This is the first block of variables to be entered into the analysis (block 1 of 1). Click on the bottom (top right of independent box), marked “Next”; this will give you another box to enter the next block of indepdent variables (block 2 of 2). Here you are to enter (nabuse). Note: Be sure the Method box states “Enter”.

5. Click on the Statistics button (right side of box) and click on Descriptives, Estimates, Confidence Interval (should be 95%), R square change, and Model Fit, and then click on Continue.

6. Click on OK.

Assignment: (When answering all questions, use the data on the coefficients panel from Model 2).

1. Analyze the data from the SPSS output and write a paragraph summarizing the findings. (Use the example in the SPSS output file as a guide for your write-up.)

2. Which of the predictors were significant predictors in the model?

3. Which of the predictors was the most relevant predictor in the model?

4. Interpret the unstandardized coefficents for educational attainment and poor health.

5. If you wanted to predict a woman’s current CES-D score based on the analysis, what would the unstandardized regression equation be? Include unstandardized coefficients in the equation.

For this assignment, answer question 1 through 5 individually. DO NOT ANSWER IN PARAGRAPH FORM.

Required Readings

Gray, J.R., Grove, S.K., & Sutherland, S. (2017). Burns and Grove’s the practice of nursing research: Appraisal, synthesis, and generation of evidence (8th ed.). St. Louis, MO: Saunders Elsevier.

· Chapter 24, “Using Statistics to Predict”

This chapter asserts that predictive analyses are based on probability theory instead of decision theory. It also analyzes how variation plays a critical role in simple linear regression and multiple regression.

Statistics and Data Analysis for Nursing Research

· Chapter 9, “Correlation and Simple Regression” (pp. 208–222)

This section of Chapter 9 discusses the simple regression equation and outlines major components of regression, including errors of prediction, residuals, OLS regression, and ordinary least-square regression.

· Chapter 10, “Multiple Regression”

Chapter 10 focuses on multiple regression as a statistical procedure and explains multivariate statistics and their relationship to multiple regression concepts, equations, and tests.

· Chapter 12, “Logistic Regression”

This chapter provides an overview of logistic regression, which is a form of statistical analysis frequently used in nursing research.

Optional Resources

Walden University. (n.d.). Linear regression. Retrieved August 1, 2011, from http://streaming.waldenu.edu/hdp/researchtutorials/educ8106_player/educ8106_linear_regressi