Data Cleaning Assignment

 calcium.sav  calcium.MTW  calcium.sas7bdat  calcium.xls  calcium.dat  cleanassignment.doc
 calciumgood.sav  calciumgood.MTW  calciumgood.sas7bdat  calciumgood.xls  calciumgood.dat  calciumcorrect.xls

The objective of the study for which you will analyze the data was to determine if significant gender differences existed between subjects 65 years of age and older with regard to calcium, phosphorous, and alkaline phosphatase levels (Boyd et al., 1998). The researchers performed a retrospective chart review of laboratory procedures performed in 6 different physician practices. The data consisted of 178 subjects representing 92 males and 86 females age 65 or older. In the data set, there are three discrete variables, sex, lab, and agegroup. The coding is as follows:

 Sex   1=Male; 2=Female
 Lab  1=Metpath; 2=Deyor; 3=St. Elizabeth's; 4=CB Rouche; 5=YOH; 6=Horizon
 Agegroup  1=65-69; 2=70-74; 3=75-79; 4=80-84; 5=85-89 years


The other variables of age, alkphos - alkaline phosphatase (IU/L), cammol - calcium (mmol/L), and phosmmol – inorganic phosphorus (mmol/L), are continuous.


1. The first task of the assignment is to check the validity of the data. Determine if this is a "messy" data set with variable values that appear incorrect. Attempt to recover the correct values by looking up the true values from the actual data records. Copies of these can be found on http://academic.csuohio.edu/holcombj/clean/bigtable.htm


2. Once the data is "clean", perform a summary analysis of the three discrete variables (sex, lab, and agegroup). For the variables alkphos, cammol and phosmmol, report the mean, median, standard deviation, min and max broken down by sex. Also summarize the variables alkphos, cammol and phosmmol in a similar way with the factor variable as lab.


3. Then Construct side by side box plots of the variables alkphos, cammol, and phosmmol with the factor variable as sex. Then construct side by side box plots of the alkphos, cammol, and phosmmol continuous variables with the factor variable as lab.


4. Do you believe a significant difference exists in alkphos, cammol, or phosmmol levels with respect to sex? Why or why not? Do you believe a significant difference exists in alkphos, cammol, or phosmmol levels with respect to lab? Why or why not?


5. Suppose Mr. and Mrs. Contrarian are married and Mrs. Contrarian has lower calcium than Mr. Contrarian. She refuses to believe the results of the study that men tend to have lower calcium than women because she has lower calcium than her husband. Using your results to question #3, explain to Mrs. Contrarian the flaw in her thinking.

6. One of the objectives of this research was to propose a reference range of values that are to be considered “normal” for calcium, inorganic phosphorus, and alkaline phosphatase. Looking at the results for cammol alone for each of the labs, explain why a single reference range is so difficult to establish.