Analysis of Environmental Data Problem Set Conceptual Foundations: En viro n m e n tal Data Answers

Size: px
Start display at page:

Download "Analysis of Environmental Data Problem Set Conceptual Foundations: En viro n m e n tal Data Answers"

Transcription

1 Analysis of Environmental Data Problem Set Conceptual Foundations: En viro n m e n tal Data Answers 1. For each of the following research questions, define a relevant statistical population. In doing so, clearly define the experimental or observation units (i.e., sampling units) that might comprise the statistical population. Note, there is often more than one viable alternative. a. What is the relationship between bark beetle abundance and three-toed woodpecker fecundity (# offspring/breeding female) in the Greater Yellowstone ecosystem over the course of a bark beetle outbreak? There are many possible answers. For example, individual breeding female threetoed woodpeckers are one logical observational unit, since both the dependent variable (fecundity) and independent variable (bark beetle abundance) have the potential to vary among individual female woodpeckers and their associated territories, in which case the statistical population would be the entire biological population of breeding female three-toed woodpeckers in the Greater Yellowstone ecosystem during the period of study. Alternatively, disturbance patches exhibiting distinct levels of beetle populations (or associated tree damage) could be deemed the observational units, wherein fecundity and beetle abundance would be measured at the disturbance patch scale (e.g., inclusive of several breeding females), in which case the statistical population would be the entire collection of disturbance patches comprising the Greater Yellowstone ecosystem during the period of study. And there are several other possibilities, such as watershed units in which the statistical population is the set of all watersheds in the study area, management compartments in which the statistical population is the set of all compartments in the study area, or even the entire study area where the unit is the year of study and the statistical population is the set of all years for the period of concern. b. What is the relationship between the choice of green building design and educational awareness of green building options among first time home builders in Massachusetts? There are several possibilities, but the most logical observational unit would be first time home builders, since both the dependent variable (choice of green building design, however that response might be scaled) and independent variable (green building awareness, however that variable might be scaled) have the potential to vary among individual home builders, in which case the statistical population would be all first time home builders in Massachusetts for the period of study.

2 Environmental Data: Problem Set Answers 2 c. What is the spatial scale of heterogeneity in soil ph across the Mount Toby State Forest? The logical observational unit would be a plot of some dimension (e.g., 1 m quadrat), since the single variable (note that there is no distinction between dependent and independent variables here, given the question as stated) has the potential to vary among plots of any dimension, in which case the statistical population would be the collection of all plots of specified dimension across Mount Toby State Forest. Note, given the infinite possible plot dimensions, there are infinite statistical populations that could be defined. The choice of plot size, and thus the observation unit and statistical population, would depend on environmental considerations (e.g., the scale of the ecological phenomena you are ultimately interested in) and logistical/practical considerations related to field data collection. d. What are the factors affecting the probably of tree infestation by Asian Longhorn Beetle in the city of Worcester, Massachusetts? There are many possibilities, but a logical choice of observational unit would be individual trees, since both the dependent variable (tree infestation, however that response might be scaled, e.g., infected or not, percentage, etc.) and independent variables (environmental factors, whatever they might be, e.g., tree diameter, and however they might be scaled) have the potential to vary among individual trees, in which case the statistical population would be the collection of all trees in the city of Worcester during the period of study. Alternatively, city streets or city blocks could be deemed the observational units, wherein the dependent and independent variables would be measured at the street or block scale, respectively (e.g., proportion of street trees infected), in which case the statistical population would be the collection of all streets or blocks in the city during the period of study. e. What is the affect of watershed imperviousness on the base flow (annual low flow) of 3 rd order streams in southern New England? rd A logical observational unit would be 3 order watersheds, since both the dependent variable (base flow) and independent variable (watershed imperviousness) have the potential to vary among watersheds, in which case the rd statistical population would be the collection of all 3 order watersheds in southern New England. Note, here the study design might involve measuring base flow in each sampled watershed over a period of years and taking the average (or minimum), in which case the annual measurements of base flow are actually subsamples, since they will get averaged into a single value for each observational unit (watershed).

3 Environmental Data: Problem Set Answers 3 f. What is the probability of a blandings turtle crossing a road in relation to road width and traffic rate? There are several possibilities, but a logical choice of observational unit would be individual turtles, since both the dependent variable (crossing success) and independent variables (road width and traffic rate) have the potential to vary among individual turtles assuming turtles are crossing different roads at different times independently, in which case the statistical population would be the collection of all blandings turtles. Alternatively, sections of street could be deemed the observational units, wherein the dependent and independent variables would be measured at the street scale (e.g., proportion of attempts successful), in which case the statistical population would be the collection of all street sections in the study area. g. What is the affect of forest stand thinning level (e.g., residual tree basal area) on residual tree growth? There are several possibilities, but a logical choice of observational unit would be the forest stand, since both the dependent variable (residual tree growth) and independent variable (thinning level; i.e., residual tree basal area) have the potential to vary among forest stands, in which case the statistical population would be the collection of all forest stands in the study area. Note, in this case tree growth might be measured on individual trees, but these would be subsamples that get averaged to produce a value for the dependent variable for each stand. Alternatively, individual trees could be deemed the observational units, wherein the dependent and independent variables would be measured at the tree scale, in which case the statistical population would be the collection of all trees in the study area. Note, in this case, trees are samples, not subsamples, because each tree would be a separate observation in the subsequent statistical analysis. h. What is the relationship between invertebrate community diversity (e.g., # taxa) in forested wetlands and the level of human development within a 100-m radius? A logical observational unit would be individual forested wetlands (patches), since both the dependent variable (invertebrate community diversity) and independent variable (level of human development) have the potential to vary among wetland patches, in which case the statistical population would be the collection of all forested wetland patches in the study area. 2. Choose a research question within your field of study (ideally, one associated with your thesis/dissertation/professional paper) and define the relevant statistical population. Clearly identify the extent and/or number and the type of experimental or observation units that comprise the statistical population.

4 Environmental Data: Problem Set Answers 4 Good luck! 3. For each of the following research questions and associated data sets, determine the type of dependent (response) data represented (i.e., continuous, count, proportion, binary, time to death/failure, time series, circular). In addition, determine which, if any, are the dependent and independent variables. a. Does three-toed woodpecker hatching success rate vary in relation to bark beetle abundance in the Greater Yellowstone Ecosystem? Data include: #eggs hatched, #eggs laid, and an index of bark beetle abundance within the nesting territory for each of 100 nests observed over the course of the study. Type of data: proportional, because the count of #eggs hatched is out of a total #eggs laid, making it a proportion. Note, with proportional data, there is always a trial of some size (trial size, usually greater than 1) and the trial is the observational unit, which helps to distinguish this from cross-classified categorical data (see below). Dependent variable: hatch success (expressed as a proportion, #eggs hatched/#eggs laid). Independent variable(s): bark beetle abundance, measured in some appropriate way. Observational unit: the individual nest or territory. Note, the nest has a trial size equal to the number of eggs laid. Statistical population: the collection of all nests or territories in the study area (N=??). Sample: the 100 nests sampled (n=100). b. Is the wind direction on top of Mount Greylocks nonrandom? Data include wind direction measured for 1,000 regularly spaced hours over the course of one year at a weather station on top of Mount Greylocks. Type of data: circular, because direction is measured in degrees which is a circular variable because the two ends of the numerical continuum are actually identical in meaning. Note, the question does not warrant treating this data as a time series, but the data could be coerced into a time series format by first transforming the wind direction (degrees) into a linear variable based on a reference direction (we ll discuss this in class) and then analyzing the temporal pattern of variation in wind direction over time. However, this is not consistent with the original question. Dependent/Independent variable(s): there is only one measured variable here, so there is no distinction between dependence and independence, which requires at least two variables. Note, the overall study context may suggest that wind be considered at least conceptually as either a dependent variable or independent variable, but this will entirely depend on the question being considered. In the context of the question as stated here, there is no distinction between dependent

5 Environmental Data: Problem Set Answers 5 and independent. Observational unit: the hour. Statistical population: the collection of all hours in the year (N=8,760). Sample: the 1000 hours measured (n=1,000). c. What is the dominant spatial scale of variability in soil ph in Mount Toby State Forest? Data include soil ph measured at 1 m intervals along a 1 km transect across the study area. Type of data: time series, because the measured variable (ph) is repeatedly measured in a sequence, in this case a spatial sequence as opposed to a temporal sequence, and the interest is in the pattern of variation in the measured variable. Dependent/Independent variable(s): there is only one measured variable here, so there is no distinction between dependence and independence, which requires at least two variables. Note, the overall study context may suggest that ph be considered, at least conceptually as either a dependent variable (e.g., responding to plant cover) or independent variable (e.g., affecting plant growth), but this will entirely depend on the question being considered. In the context of the question as stated here, there is no distinction between dependent and independent. Observational unit: the 1-m plot. Statistical population: the collection of all 1-m plots along the 1 km transect (N=1,000). Sample: technically none, since all 1,000 1-m plots are measured. Note, in practice, we might consider this one transect to be a sample of the Mount Toby State Forest and thus the statistical population would be all 1-m plots in the Forest (N=??) and the sample would be the 1,000 plots measured (n=1,000). d. Is attitude towards motorized recreation independent of education level among users of Mount Toby State Forest? Data include attitude class (favor motorized use, opposed to motorized use, neutral to motorized use) and education level (high school, BS, MS, or PhD) for each of 100 randomly surveyed visitors to Mount Toby State Forest. Type of data: count data of the cross-classified categorical type, because the data represent counts in each category of attitude class and education class (i.e., 12 categories derived by combining each level of attitude class with each level of education class). Note, since we know the total number of visitors, it is tempting to consider the counts in each class to be proportions, and thus the data to be proportional. However, with cross-classified categorical data, there are no trials and the observational units don t correspond to trials of some size. Here, the observational unit (in this case, person) is simply classified into one of the categories, resulting in a count for each category. Each observational unit (person, in this case) doesn t have a proportional response, they merely fall into one of the categories, whereas with proportional data, each observational unit has a proportional response. This is a subtle, but important distinction. Dependent variable: counts in each category. Independent variable(s): attitude class (categorical, with 3 levels) and education

6 Environmental Data: Problem Set Answers 6 level (categorical, with 4 levels). Observational unit: the individual person or visitor. Statistical population: the collection of all people or visitors to Mount Toby State Forest (N=??). Sample: the 100 visitors surveyed (n=100). e. Do street trees confer greater home energy efficiency in Amherst, Massachusetts? Data include presence/absence of street trees and a measure of home energy efficiency for each of 100 homes in Amherst, Massachusetts, controlling for home age, size and construction. Type of data: continuous, because the dependent variable, home energy efficiency, is continuously scaled. Note, the binary presence/absence of street trees is the independent variable, not the dependent data, so it does not determine the type of dependent data. Dependent variable: home energy efficiency (measured in some appropriate way). Independent variable(s): presence/absence of street trees. Observational unit: the individual home. Statistical population: the collection of all homes in Amherst, perhaps limited to those with similar age, size and construction (N=??). Sample: the 100 homes sampled (n=100). f. What is the expected longevity of an artificial white pine snag (i.e., created by cutting the crown off the tree) in western Massachusetts, controlling for tree size, soil and slope position? Data include age of the snag at the time of falling for 100 created snags on Caldwell State Forest in Pelham, Massachusetts. Type of data: time to death/failure, because the measured response is the time to failure (snag fall). Dependent/Independent variable(s): there is only one measured variable here, time to snag fall, so there is no distinction between dependence and independence, which requires at least two variables. Note, the overall study context may suggest that snag longevity or survival rate be considered, at least conceptually, as the dependent variable responding to the independent variables tree size, soil and slope position, but since these independent variables are being controlled for in this study, there is no variability among the observational units (snags) with respect to these variables, and thus they are not technically independent variables in the context of this study. Observational unit: the individual snag. Statistical population: the collection of all snags in Caldwell State Forest meeting the requirements of size, soil and slope position (N=??). Sample: the 100 snags sampled (n=100). g. How is the probability of fish passage through a culvert affected by physical characteristics of the culvert and flow in the Connecticut River watershed? Data include #tagged brook

7 Environmental Data: Problem Set Answers 7 trout of a standard size below the culvert, #tagged fish successfully passing through the culvert, and the culvert length, culvert diameter, substrate type, and flow velocity for each of 100 culverts distributed throughout the Connecticut River watershed. Type of data: proportional, because the count of #tagged fish passing through the culvert is out of a total #tagged fish below the culvert, making it a proportion. Note, as with all proportional data, there is a trial of size m for each observational unit. Dependent variable: culvert passage success (expressed as a proportion, #tagged fish passing through culvert/#tagged fish below culvert). Independent variable(s): culvert length, culvert diameter, substrate type and flow velocity. Observational unit: the individual culvert. Note, the culvert has a trial size equal to the number of tagged fish in the stream below it. Statistical population: the collection of all culverts in the Connecticut River watershed (N=??). Sample: the 100 culverts sampled (n=100). h. Is the likelihood of choosing a green building design affected by home owner awareness of green building options in Massachusetts? Data include use of green building practices (yes or no) and an index of green building awareness (from none to high derived from several factors) for a random sample of 100 new home builders in the Pioneer Valley, Massachusetts. Type of data: binary, because the dependent variable (choice of building design) can take on only 1 of 2 values: green building verus conventional building. Note, here each home builder is the observational unit and they either choose green building or not, thus the trial size = 1. Viewed this way, binary data represent the special case of proportional data when trial size equals 1. Dependent variable: binary choice of green building or conventional building. Independent variable(s): home owner awareness of green building design options, measured in some appropriate manner. Observational unit: the individual new home builder. Note, the home builder is the trial and it has a trial size of 1. Statistical population: the collection of all new home builders in the Pioneer Valley (N=??). Note, there is a mismatch between the desired scope of inference in the question (Massachusetts) and the realized scope of inference from the study design (Pioneer Valley). The statistical population is based on the realized study, so here either the question needs to change in scope or the study design needs to expand in scope. Sample: the 100 new home builders (n=100). 4. Obtain a real data set from your field of study, either one that you collected or from your major professor, and identify the type of data represented. If more than one type of data are included, identify each type. Good luck!