Quantcast
Channel: Everyday Research Methods
Viewing all 275 articles
Browse latest View live

Online resource: Writing good questions


"Talk deeply, be happy?" It seems to replicate...

$
0
0
Fig81
Figure 8.1: Data from Mehl et al., (2010)


In Chapter 8, one of the examples features a study that found that the more "deep talk" people engage in (as measured by the EAR), the happier they reported being (Mehl et al., 2010) (see Figure 8.1). 

The same 2010 study also reported that the amount of engagement in small talk was associated with lower well being (this result is presented in Figure 8.9). 

Now a team of researchers (including many of the same researchers) have published a second study with similar methodology (Milek et al., in press). The team collected new data in a larger, more heterogeneous sample of U.S. adults. (the original study was only on college students.) The authors used Bayesian analytic techniques, including pooling the new samples with the sample from the 2010 study. You can view a preprint of the report here. It's in press at Psychological Science.

The new paper confirmed evidence for the "deep talk" effect. That is, substantive conversations were linked with greater well being, with a moderate effect size. But the team did not find evidence for the complementary effect of small talk. That is, in the new analysis, the estimate of the small talk was not different from zero. 

If you teach this example, it's worth updating students: The "deep talk" result in Figure 8.1 has been replicated, but one of the effects in in Figure 8.9 (the small talk effect) has not. 

Fig89
Figure 8.9: Data from Mehl, et al., (2010)

 

 

 

Does legal marijuana lower opioid prescriptions? A quasi-experiment

$
0
0
Credit Gina Kelly:Alamy Stock Photo Credit
Legalizing marijuana is associated with lower rates of opioid prescriptions in those U.S. states. Photo: Gina Kelly/Alamy Stock Photo


Opioid addition is a major health crisis in the United States.  Deaths from overdose increased dramatically in the last 5 years. Opioid addiction sometimes starts when a person in pain is prescribed legal opioid drugs by a physician. Opioid prescriptions can also be sold illegally. For these reasons, opioid prescription rates are an indicator of opioid abuse in a particular region.  

Some public health researchers have investigated whether legalizing marijuana can reduce rates of opioid use and abuse. Marijuana is an alternative for controlling chronic pain that, according to many experts, has a lower addiction risk. Recently, researchers published two studies, both with quasi-experimental designs, that tested whether legalized marijuana could lower the rates of opioid prescriptions. Like many quasi-experiments, the researchers took advantage of a real-world situation: Some U.S. states have legalized marijuana and other states have not.

ABC news covered the the research. There were two studies with similar designs, but we'll focus on the first one:

One looked at trends in opioid prescribing under Medicaid, which covers low-income adults, between 2011 and 2016. It compared the states where [medical] marijuana laws took effect versus states without such laws.... 

Results showed that laws that let people use marijuana to treat specific medical conditions were associated with about a 6 percent lower rate [over the years studied] of opioid prescribing for pain. That's about 39 fewer prescriptions per 1,000 people using Medicaid.

And when states with such a law went on to also allow recreational marijuana use by adults, there was an additional drop averaging about 6 percent.

Questions:

a) What is the "independent" variable in this quasi-experiment? What is the dependent variable? Was the independent variable independent groups or within groups? 

b) What makes this a quasi-independent variable? 

c) Of the four quasi experimental designs, which seems to be the best fit: Non-equivalent control group posttest only? Non-equivalent control group pretest-posttest? Interrupted time series design? Non-equivalent control group posttest-only design?

d) How might you graph the results described above? 

e) To what extent can these data support the causal claim that "legalizing marijuana, either for medical use or recreational use, can lower the rates of opioid prescriptions in the Medicaid system"?

 

The two studies are available open-access through JAMA Internal Medicine. Here's the study using Medicare Part D prescription rates, and here's the study using Medicaid prescription rates. 

 

Suggested answers

a) The independent variable was whether a state had legalized marijuana or not. It was independent groups (states either had, or had not, legalized the drug). The dependent variable was the number of opioid prescription rates through Medicaid. Another variable, somewhat difficult to discern from the journalist's description, was year of study (from 2011 to 2016)

b) This IV was not manipulated/controlled by the experimenter. The researcher did not decide which states could legalize marijuana or not.

c) This is probably best characterized as a non-equivalent control group, pretest-posttest design. There were two types of states (legalized and not) and one main outcome variable: opioid prescriptions. The prescription rate was compared over time (from 2011 to 2016), making it pretest-posttest.

d) Your y-axis should have "opioid prescriptions" and the x-axis should include the years 2011 to 2016.  You could then have "States with legalization" and "States without legalization" as two different colored lines. 

e) The results of the study show covariance (States with legalized marijuana had lower opioid prescriptions). The fact that they compared opioid prescriptions over time (2011 to 2016) suggest that the design is able to establish temporal precedence. Presumably (although this is not clear from the articles), 2011 represents a year before many of the marijuana laws took effect and 2016 data occurred after the laws had been active.
As for internal validity, it's possible that states that legalize are different in systematic ways than states that do not. For example, states that legalize marijuana are more likely to be in the North and West, have lower poverty rates, and so on. However, the pretest-posttest design, in which they studied the "drop in opioid prescriptions over time" rather than "overall rate of opioid prescriptions" helps minimize some of these concerns. As with most quasi-experiments, causation is not a slam-dunk, because the experimenter does not have full control over the independent variable. 

Does smoking marijuana cause car fatalities?

$
0
0
 Credit LARS HAGBERG:AFP:Getty Images
The study found an estimated 12% higher rate of fatal accidents after 4:20pm on April 20. Credit: Lars Hagberg/AFP/Getty Images


Here's a study that took advantage of "4-20", an unofficial holiday which people celebrate by holding pot-smoking parties starting at 4:20pm. Here's how the quasi-experiment was described in a New York Times story: 

Researchers used 25 years of data on car crashes in the United States in which at least one person died. They compared the number of fatal accidents between 4:20 p.m. and midnight on April 20 each year with accidents during the same hours one week before and one week after that date.

a) What are the "independent" and dependent variables in this study? (And why did I put independent variable in quotes?)

Here's how the journalist described the results:

Before 4:20 p.m. there was no difference between the number of fatalities on April 20 and the number on the nearby dates. But from 4:20 p.m. to midnight, there was a 12 percent increased risk of a fatal car crash on April 20 compared with the control dates. 

b)  Of the four quasi experimental designs, which seems to be the best fit: Non-equivalent control group posttest only? Non-equivalent control group pretest-posttest? Interrupted time series design? Non-equivalent control group posttest-only design? 

c) Sketch a graph of the results described. 

d) The Times reported that "The increased risk was particularly large in drivers 20 and younger." Why might the researchers have included this detail?

e) The Times's headline read, "Marijuana Use Tied to Fatal Car Crashes". What kind of claim is this? (Frequency, Association, or Cause?)

f) To what extent can these results support a causal claim about marijuana causing crashes? Apply the three causal criteria to this design and results. 

 

 

 

Claim: Standing up at your desk could make you smarter

$
0
0
Credit Smith Collection Gado Getty ImagesI have a standing desk in my office. Am I getting smarter as a result? An editorial summarizes some of the evidence.

I'm standing at my desk as I compose this post....could that make my writing go better? Yes, according to an editorial entitled, "Standing up at your desk could make you smarter." The editorial leads with a strong causal claim and then describes three studies, each with a different design. Here's one of the studies:  

A study published last week...showed that sedentary behavior is associated with reduced thickness of the medial temporal lobe, which contains the hippocampus, a brain region that is critical to learning and memory.

The researchers asked a group of 35 healthy people, ages 45 to 70, about their activity levels and the average number of hours each day spent sitting and then scanned their brains with M.R.I. They found that the thickness of their medial temporal lobe was inversely correlated with how sedentary they were; the subjects who reported sitting for longer periods had the thinnest medial temporal lobes.

a) What were the two variables in this study? Were they manipulated or measured? Was this a correlational or experimental study? 

b) The author writes that the study "showed that sedentary behavior is associated with reduced thickness of the medial temporal lobe." Did he use the correct verb? Why or why not?

Here's a second study described in the editorial:

Intriguingly, you don’t even have to move much to enhance cognition; just standing will do the trick. For example, two groups of subjects were asked to complete a test while either sitting or standing [randomly assigned]. The test — called Stroop — measures selective attention. Participants are presented with conflicting stimuli, like the word “green” printed in blue ink, and asked to name the color. Subjects thinking on their feet beat those who sat by a 32-millisecond margin.

c) What are the two variables in this study? Were they manipulated or measured? Was this a correlational or experimental study? 

d) Does this study support the author's claim that "you don't have to move much to enhance cognition; just standing will do the trick"? Why or why not?

e) Bonus: What kind of experiment was being described here? (Posttest only, prettest/posttest, repeated measures, or concurrent measures?) Comment, as well, on the effect size. 

It’s also yet another good argument for getting rid of sitting desks in favor of standing desks for most people. For example, one study assigned a group of 34 high school freshmen to a standing desk for 27 weeks. The researchers found significant improvement in executive function and working memory by the end of the study. 

f) What are the variables in this study? Were they manipulated or measured? 

g) Do you think this study can support a causal claim about standing desks improving executive function and working memory? 

The author added the following statement to the third study on high school freshmen:

True, there was no control group of students using a seated desk, but it’s unlikely that this change was a result of brain maturation, given the short study period.

h) What threat to internal validity has the author identified in this statement?

i)  What do you think of his evaluation of this threat? 

j) Of the three studies presented, which provides the strongest evidence for the claim that "standing up at your desk could make you smarter"? What do you think? On the basis of this evidence, should I keep standing here? 

 

Question wording matters

$
0
0

Collecting accurate data in a poll is difficult business. Many of us focus on the sampling strategies of polling organizations, and rightly so: External validity depends on whether we include "cell phone only" numbers in our samples, how we account for different rates of responding across certain groups, and so on. However, this post reminds us that question wording also matters. Construct validity is just as important for polls.

In this report, Pew Research Center describes how people's opinions change depending on how a question is posed. The poll asked Americans whether we should increase the size of the House of Representatives. People in the poll were randomly assigned to hear either the original question, or the same question with additional context. Notice how people's support changes when additional context is added: 

Screen Shot 2018-05-16 at 12.22.55 PM
“The Electoral College, Congress, and representation,” Pew Research Center, Washington, D.C. (April 26, 2018). https://pewrsr.ch/2rLd0OI, accessed on May 16, 2018

 

a) Even though this is a polling example, Pew conducted an experiment. What are its independent and dependent variables? 

b) What causal claim could this polling result support? 

c) In your opinion, which polling question--the original one or the one with added context--provides the most accurate measure of people's opinion? 

d) Find another poll question on the Pew Research web site. Speculate how some additional information might change people's responses. 

 

 

 

 

 

 

 

Below is another presentation of the same results. Notice how the two  different question wordings seem to affect people differently, depending on their political affiliation.

Screen Shot 2018-05-16 at 12.23.06 PM
“The Electoral College, Congress, and representation,” Pew Research Center, Washington, D.C. (April 26, 2018). https://pewrsr.ch/2rLd0OI, accessed on May 16, 2018

 

e) Why might Democrats be more influenced by the contextual information provided? 

f) This could be viewed as a 2 x 2, IV x PV factorial design, in which the IV is "question wording" (original or added context) and the PV is "political affiliation" (Democrat or Republican). If you've studied Chapter 12, perhaps you can identify whether there are main effects and interactions in this pattern of data. 

The same Pew report contains additional examples of how question wording alters responses to a question about the Senate, and shows how different subgroups respond to questions about the Electoral College. 

Claim: Cultural identity impacts the food you like

$
0
0
GettyImages-85508463
Under what conditions did Southerners find this dish especially tasty? Credit: Andre Baranowski/Getty Images

I was skeptical at first of the causal claim in the headline, From Collards To Maple Syrup, How Your Identity Impacts The Food You Like. After all, in order to support a causal claim, you need to manipulate a variable, and how can we manipulate cultural identity? 

Before reading on, think about:

a) What word in the headline makes this a causal claim?

b) What foods might be associated with your own cultural identity (or identities?)

Here are some elements of the journalist's story. NPR reported about...

...a recent study in the Journal of Experimental Social Psychology, authored by Jay Van Bavel, social psychologist at New York University and his colleagues. The researchers found that the stronger your sense of social identity, the more you are likely to enjoy the food associated with that identity. The subjects of this study were Southerners and Canadians, two groups with proud food traditions.

c) In the study above, what are the two variables? Do they seem to be manipulated or measured?

d) Given your answer to question c) is this study really an "experiment"? 

e) Can this study (above) support the causal claim that "identity impacts the food you like"? What are some alternative explanations? Hint: Think about temporal precedence and third variable explanations. 

Here's the description of a second study:

In a second experiment, containing 151 people, researchers also found that when Southerners were reminded of their Southernness — primed, in psychology speak — their perception of the tastiness of Southern food was even higher. That is, the more Southern a person was feeling at that moment, the better the food tasted [compared to a group who was not primed].

e) What are the two variables in the study above? Were the variables manipulated or measured?

f) Given your answer to question e) is this study really an "experiment"? 

g) Can this study support the claim that "identity impacts the food you like"? 

They found a similar result when taste-testing with Canadians, finding that Canadian test subjects only preferred the taste of maple syrup over honey in trials when they were first reminded of their Canadian identity.

h) You know the drill: For the study above, what kind of study was is? What are its variables? 

i) Challenge question: Can you tell if the independent variable in the Canadian study was manipulated as between groups or within groups? 

In sum, it appears that two out of the three studies reviewed by this NPR article were experimental, so they're more likely to support the causal claim about "identity impacting the food you like." The journalist calls attention to this manipulation of identity in this description:

The relationship between identity and food preference is not new. However, the use of priming to induce identity makes this study different from its predecessors.

"Priming is like opening a filing drawer and bringing to your attention all the things that are in the drawer," says Paul Rozin, food psychologist at University of Pennsylvania, who was not involved in the study. "You can't really change peoples' identities in a 15-minute setting, but you can make one of their identities more salient, and that's what they've done in this study."

j) What other ways might you manipulate cultural identity in an experimental design? 

Good news! The empirical journal article is open-access here. When you read it, you'll see that the journalist simplified the design of the studies for her article in NPR. 

Does loud music make you eat bad food?

$
0
0
GettyImages-912175184 (1)
That sounds delicious, thank you! Photo credit: Nathan Motoyama/EyeEm/Getty Images

June is food month on the ERM blog! This story's about the headline, Loud music makes you eat bad food?!?!?Whaaaaaaatever!

QRock 100.7 was one of several news outlets that had fun describing this study for its readers. 

Let's find out what kind of study was conducted to test the claim. The Q100.7 journalist wrote:  

A new study found loud music makes us more likely to order unhealthy food when we’re dining out.  A new study in Sweden found loud music in restaurants makes us more likely to choose unhealthy menu options.  And we’re more likely to go with something healthy like a salad when the music ISN’T so loud.

Researchers went to a café and played music at different decibel levels to see how it affected what people ordered.  Either 55 decibels, which is like background chatter or the hum from a refrigerator . . . or 70 decibels, which is closer to a vacuum cleaner.

And when they cranked it to up 70, people were 20% more likely to order something unhealthy, like a burger and fries.

They did it over the course of several days and kept getting the same results.  So the study seems pretty legit.

a) OK, go: What seems to be the independent variable in this study? What were its levels? How was it operationalized?

b)  What seems to be the dependent variable? How was it operationalized? Think specifically about how they might have operationalized the concept "unhealthy."

c) Do you think this study counts as an experiment or a quasi-experiment? Explain your answer. 

d) This study can be called a "field study" or perhaps a "field experiment". Why?

e) To what extent can this study support the claim that loud music makes you eat bad food? Apply covariance, temporal precedence, and internal validity to your response.

f) If you were manipulating the loudness of the music for a study like this, how might you do so in order to ensure it was the music, and not other restaurant factors, were responsible for the increase in ordering "unhealthy" food?

g) The Q100.7 journalist argues that the study seems "pretty legit." What do you think the journalist meant by this phrase? 

h) The study on food and music volume are summarized in an open-access conference abstract, published here. You might be surprised to read, contrary to the journalist's report, that the field study was conducted on only two days--with one day at 50db and the other at 70 db. How does this change your thoughts about the study? 

g) Conference presentations are not quite the same as peer-reviewed journal publications. Take a moment (and use your PSYCinfo skills) to decide if the authors, Biswas, Lund, and Szocs, have published this work yet in a peer reviewed journal.  Why might journalists choose to cover a story that has only been presented at a conference instead of peer-reviewed? Is this a good practice in general? 

 

 


Delay of gratification (marshmallow study replication)

$
0
0
GettyImages-108272760
Her school achievement later in life can be predicted from her ability to wait for a treat (or by her family's SES). Photo: Manley099/Getty Images

There's a new replication study about the famous "marshmallow study",  and it's all over the popular press. You've probably heard of the original research: Kids are asked to sit alone in a room with a single marshmallow (or some other treat they like, such as pretzels). If the child can wait for up to 15 minutes until the experimenter comes back, they receive two marshmallows. But if they eat the first one early, they don't. As part of the original study, kids were tracked over several years. One of the key findings was that the longer children were able to wait at age 4, the better they were doing in school as teenagers. Psychologists have often used this study as an illustration of how self-control is related to important life outcomes. 

The press coverage of this year's replication study illustrates at least two things. First, it's a nice example of multiple regression.  Second, it's an example of how different media outlets assign catchy--but sometimes erroneous--headlines on the same study.

First, let's talk about the multiple regression piece. Regression analyses often try to understand a core bivariate relationship more fully. In this case, the core relationship they start with is between the two variables, "length of time kids waited at age 4" and "test performance at age 15." Here's how it was described by Payne and Sheeran in the online magazine Behavioral Scientist:

The result? Kids who resisted temptation longer on the marshmallow test had higher achievement later in life. The correlation was in the same direction as in Mischel’s early study. It was statistically significant, like the original study. The correlation was somewhat smaller, and this smaller association is probably the more accurate estimate, because the sample size in the new study was larger than the original. Still, this finding says that observing a child for seven minutes with candy can tell you something remarkable about how well the child is likely to do in high school. 

a) Sketch a well-labelled scatterplot of the relationship described above. What direction will the dots slope? Will they be fairly tight to a straight line, or spread out?

b) The writers (Payne and Sheeran) suggest that a larger sample size leads to a more accurate estimate of a correlation. Can you explain why a large sample size might give a more accurate statistical estimate? (Hint: Chapter 8 talks about outliers and sample size--see Figures 8.10 and 8.11.)

Now here's more about the study: 

The researchers next added a series of “control variables” using regression analysis. This statistical technique removes whatever factors the control variables and the marshmallow test have in common. These controls included measures of the child’s socioeconomic status, intelligence, personality, and behavior problems. As more and more factors were controlled for, the association between marshmallow waiting and academic achievement as a teenager became nonsignificant. 

c) What's proposed above is that social class is a third variable ("C") that might be associated with both waiting time ("A") and school achievement ("B"). Using Figure 8.15. draw this proposal. Think about it, too: Why does it make sense that lower SES might go both with lower waiting time (A)? Why might lower SES go with lower school achievement (B)? 

d) Now create a mockup regression table that might fit the pattern of results being described above. Put the DV at the top (what is the DV?), then list the predictor variables underneath, starting with Waiting time at Age 4, and including things like Child's Socioeconomic Status and Intelligence. Which betas should be significant? Which should not?

Basically, here we have a core bivariate relationship (between wait time and later achievement), and then a critic suggests a possible third variable (SES). They used regression to see if the core relationship was still there when the third variable was controlled for. The core relationship went away, suggesting that SES was a third variable that can help explain why kids who wait longer do better in school later on. 

Next let's talk about some of the hype around this replication study. The Behavioral Scientist piece (quoted above) is one of the more balanced descriptions. Its headline was, Try to Resist Misinterpreting the Marshmallow Test. It emphasized that the core relationship was replicated. It also explains in some detail why SES is related to self-control, and how the two probably cannot be meaningfully separated--it's a nuanced report. But other press coverage had a doomsday feel:

Vox's subtitle read:
The famous psychology test gets roasted in the new era of replication.

The Guardian blared:
Famed impulse control 'marshmallow test' fails in new research

And the Quartz website said:
We learned the wrong lesson about self-control from the famous marshmallow test

One person on Twitter even wrote,
"The marshmallow/delayed gratification study always felt "wrong" to me - this year it was reported to be hopelessly flawed"

Are these headlines and comments fair? Probably not. As Payne and Sheeran write in Behavioral Scientist

The problem is that scholars have known for decades that affluence and poverty shape the ability to delay gratification. Writing in 1974, Mischel observed that waiting for the larger reward was not only a trait of the individual but also depended on people’s expectancies and experience. If researchers were unreliable in their promise to return with two marshmallows, anyone would soon learn to seize the moment and eat the treat. He illustrated this with an example of lower-class black residents in Trinidad who fared poorly on the test when it was administered by white people, who had a history of breaking their promises. Following this logic, multiple studies over the years have confirmed that people living in poverty or who experience chaotic futures tend to prefer the sure thing now over waiting for a larger reward that might never come. But if this has been known for years, where is the replication crisis?

e) Give at least 2 reasons why online sources might use incorrect headlines when they cover psychological science. 

f) Check out at least two different press stories about the marshmallow replication effect, and see how similar they are to each other and to the summary you analyzed above, by Payne and Sheeran. 

You can read the published paper for yourself! The full replication paper is in Psychological Science and is open access

Social media effect sizes

$
0
0
1.2
Is social media use responsible for depressed mood? Photo: Ian Allenden/Alamy stock

Do smartphones harm teenagers? If so, how much? In this blog, I've written before about the quasi-experimental and correlational designs used in research on screen time and well-being in teenagers. In that post you can practice identifying the different designs we can use to study this question. 

Today's topic is more about the size of the effect in studies that have been published. A recent Wired story tried to put the effect size in perspective. 

One side of the argument, as presented by Robbie Gonzalez in Wired, scares us into seeing social media as dangerous.

For example, first

...there were the books. Well-publicized. Scary-sounding. Several, really, but two in particular. The first, Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked, by NYU psychologist Adam Alter, was released March 2, 2017. The second, iGen: Why Today's Super-Connected Kids are Growing Up Less Rebellious, More Tolerant, Less Happy – and Completely Unprepared for Adulthood – and What That Means for the Rest of Us, by San Diego State University psychologist Jean Twenge, hit stores five months later.

In addition,

...Former employees and executives from companies like Facebook worried openly to the media about the monsters they helped create. 

But is worry over phone use warranted? Here's what Gonzalez wrote after talking to more researchers:

When Twenge and her colleagues analyzed data from two nationally representative surveys of hundreds of thousands of kids, they calculated that social media exposure could explain 0.36 percent of the covariance for depressive symptoms in girls.

But those results didn’t hold for the boys in the dataset. What's more, that 0.36 percent means that 99.64 percent of the group’s depressive symptoms had nothing to do with social media use. Przybylski puts it another way: "I have the data set they used open in front of me, and I submit to you that, based on that same data set, eating potatoes has the exact same negative effect on depression. That the negative impact of listening to music is 13 times larger than the effect of social media."

In datasets as large as these, it's easy for weak correlational signals to emerge from the noise. And a correlation tells us nothing about whether new-media screen time actually causes sadness or depression. 

 There are several things to notice in the extended quote above. First let's unpack what it means to, "explain 0.36% of the covariance". Sometimes researchers will square the correlation coefficient r  to create the value R2. The R2  tells you the percentage of variance explained in one variable by the other (incidentally, they usually say "percent of the variance" instead of "percent of covariance."). In this case, it tells you how much of the variance in depressive symptoms is explained by social media time (and by elimination, it tells you what percentage is attributable to something else). We can take the square root of 0.0036 (that's the percentage version of 0.36%) to get the original r between depressive symptoms and social media use. It's r = .06. 

Questions

a) Based on the guidelines you learned in Chapter 8, is an r of .06 small, medium, or large? 

b) Przybylski claims that the effect of social media use on depression is the same size as eating potatoes. On what data might he be basing this claim? Illustrate your answer with two well-labelled scatterplots, one for social media and the other for potatoes. Now add a third scatterplot, showing listening to music. 

c) When Przybylski states that the correlation held for the girls, but not the boys, what kind of model is that? (Here are your choices: moderation, mediation, or a third variable problem?)

d) Finally, Przybylski notes that in large data sets, it's easy for weak correlation signals to appear from the noise. What statistical concepts are being applied here? 

e) Chapter 8 presents another example of a large data set that found a weak (but statistically significant) correlation. What is it? 

f) The discussion above between Gonzalez and Przybylski concerns which of the four big validities? 

g) Finally, Przybylski mentions that "a correlation tells us nothing about whether new-media screen time actually causes sadness or depression".  Why not? 

 

Suggested answers:

a) An r of .06 is probably going to be characterized as "small" or "very small" or even "trivial."  That's what the "potatoes" point is trying to illustrate, in a more concrete way. 

b) One scatterplot should be labeled with "potato eating" on the x axis and "depression symptoms" on the y axis. The second scatterplot should be labeled with "social media use" on the x axis and "depression symptoms" on the y axis. These first two plots should show a positive slope of points with the points very spread out--to indicate the weakness of the association. The spread of the first two scatterplots should be almost the same, to represent the claim the two relationships are equal in magnitude. The third scatterplot should be labeled with "listening to music" on the x axis and "depression symptoms" on the y axis, and this plot should show a much stronger, positive correlation (a tighter cloud of points).

c) It is a moderator. Gender moderates (changes) the relationship between screen use and depression. 

d) Very large data sets have a lot of statistical power. Therefore, large data sets can show statistical significance for even very, very, small correlations--even correlations that are not of much practical interest. A researcher might report a "statistically significant' correlation, but it's essential to also ask about the effect size and its practical value (the potatoes argument).  Note: you can see the r = .06 value in the original empirical article here, on p. 9. 

e) The example in Chapter 8 is the one about meeting one's spouse online and having a happier marriage--that was a statistically significant relationship, but r was only .03. That didn't stop the media from hyping it up, however

f) Statistical validity

g) The research on smartphones and depressive symptoms is correlational, making causal claims (and causal language) inappropriate. That means that we can't be sure if social media is leading to the (slight) increase in depressive symptoms, or if people who have more depressive symptoms end up using more social media, or if there's some third variable responsible for both social media use and depressive symptoms. As the Wired article states, 

...research on the link between technology and wellbeing, attention, and addiction finds itself in need of similar initiatives. They need randomized controlled trials, to establish stronger correlations between the architecture of our interfaces and their impacts; and funding for long-term, rigorously performed research.

Finally, the Wired article quotes (the seemingly skeptical) Pyrbylski as saying, 

"Don't get me wrong, I'm concerned about the effects of technology. That's why I spend so much of my time trying to do the science well," Przybylski says.

Good science is the best way to answer our questions. 

 

Can greening up a vacant lot ease depression in a community?

$
0
0
Week 169_shutterstock_724124668
The study randomly assigned lots like this one to be cleaned up, turned into green space, or left alone. Photo: 1000 Words/Shutterstock

When I first saw the headline, "Replacing Vacant Lots With Green Spaces Can Ease Depression In Urban Communities" I thought it was just another journalist putting a causal claim on a correlational study. So I was surprised to read this statement about the design:

[Researcher Eugenia South] and her colleagues wanted to see if the simple task of cleaning and greening these empty lots could have an impact on residents' mental health and well-being. So, they randomly selected 541 vacant lots [in the city of Philadelphia] and divided them into three groups.

They collaborated with the Pennsylvania Horticultural Society for the cleanup work.

The lots in one group were left untouched — this was the control group. The Pennsylvania Horticultural Society cleaned up the lots in a second group, removing the trash. And for a third group, they cleaned up the trash and existing vegetation, and planted new grass and trees. The researchers called this third set the "vacant lot greening" intervention.

Here's more:

The team surveyed residents living near the lots before and after their trial to assess their mental health and wellbeing. "We used a psychological distress scale that asked people how often they felt nervous, hopeless, depressed, restless, worthless and that everything was an effort," explains South.

The scale alone doesn't diagnose people with mental illness, but a score of 13 or higher suggests a higher prevalence of mental illness in the community, she says.

People living near the newly greened lots felt better. "We found a significant reduction in the amount of people who were feeling depressed," says South.

As one commentator noted:

Previous research has shown that green spaces are associated with better mental health, but this study is "innovative," says Rachel Morello-Frosch, a professor at the department of environmental science, policy and management at the University of California, Berkeley, who wasn't involved in the research.

"To my knowledge, this is the first intervention to test — like you would in a drug trial — by randomly allocating a treatment to see what you see," adds Morello-Frosch. 

Questions

a) How do we know that this is an experiment and not a correlational study?

b) What were the Independent and Dependent variables? 

c) What was the design: Posttest only? Pretest-posttest? Repeated measures? Or concurrent measures?

d) Sketch a graph of the results described above. 

e) Ask at least one question each about this study's construct, internal, external, and statistical validities.

f) Because this article was published in the open-access journal JAMA Network Open anyone can read the paper. Take a look carefully at the tables in the paper (especially Table 2). How strong do the results seem to you? Are the differences between the conditions large? Do you see improvements on all the measured variables, or just on a few of them? 

g) Why does the design of this study help support the causal claim that "Replacing vacant lots with green spaces can ease depression...."? (Apply the three causal criteria.) 

 

Do kids do better in private schools?

$
0
0
12.3a
What evidence would it take to convince us that it's the schools, not other factors, that are responsible for the outcomes of private school students? Photo: Image Source / Alamy Stock Photo

A large study has compared the outcomes of children who've attended private schools to those who've attended public schools.  A journalist summarized the report in the Washington Post. The study provides a nice example of how multivariate regression can be used to test third variable hypotheses. 

When we look simply at the educational acheivement of students in private schools vs. public schools, private school students have higher achievement scores. However, all such studies are correlational, because the two variables--Type of School and Level of Achievement--are measured. 

Therefore, such studies how covariance, because the results depict a relationship. The study may even show temporal precedence, because attending school presumably precedes the measure of achievement. However, such studies are weak on internal validity. We can think of several alternative explanations for why children in private schools are scoring higher. 

One major alternative explanation is socioeconomic status. Children from wealthier families are more likely to afford private schools. And in general, children from wealthier families tend to score higher on achievement tests. 

The Washington Post journalist quoted one of study's authors, Robert Pianta, who summed up the study's results this way:

“You only need to control for family income and there’s no advantage,” Pianta said in an interview. “So when you first look, without controlling for anything, the kids who go to private schools are far and away outperforming the public school kids. And as soon as you control for family income and parents’ education level, that difference is eliminated completely.”

Questions

a) Draw little diagrams similar to those in Figure 8.15 (in the 3rd ed.) to depict the arguments being made in this study. What would A be? What about B? In the quote from Pianta, above, what would the C variable(s) be? 

b) The researchers used type of school (private vs. public), which is a categorical variable. But in some analyses, the researchers also used "number of years in private school" as an alternative version of this variable. Is "number of years in private school" categorical, ordinal, interval, or ratio data? 

c) Sketch a mock-up regression table with the criterion variable  at the top and predictors below (Use Table 9.1 as a model). Which variable do you think the researchers selected as the criterion (dependent) variable in their analyses? Which variable(s) would have been the predictors? 

d) Now that you know what the results were, think about how the beta associated with "number of years in private school"  would change when parental SES is added and removed from the regression analyses. 

Original journal article seems to be open-access. Give it a try!

Suggested answers

a) A and B would be Type of School and Level of Acheivement. It doesn't really  matter which one is A and which one is called B. C would be Family Income and/or Parental Education.

b) Ratio data (zero is meaningful in this scale because you could attend zero years of private school)

c) The criterion variable would be Achievement, and the predictors would be Number of Years of Private School, Family Income, and Parental Education.

d)When the Number of Years of Private School is on the table (in the analysis) by itself, its beta is likely to be positve and significant (more years of private school goes with higher achievement). When Family Income and Parental Education are added to the table, the beta for Number of Years of Private School should drop to zero.  This pattern of results is consistent with the argument that Family Income and Parental Education are the alternative explanation for the original relationship .

Replication Update: When do people cheat?

You're not as smart as you think

$
0
0
GettyImages-451420215
College graduates were more likely than those who'd not been to college to report  they are "smarter than average." Is their perception overconfident, or not? Photo: PeopleImages/Getty Images


It seems to be conventional wisdom that people are overconfident in their own abilities. People tend to think they are nicer, smarter, and better looking than most other people. But what's the evidence? The scientist-authors of this Wall Street Journal summary  explain, 

The claim that "most people think they are smarter than average" is a cliche of popular psychology, but the scientific evidence for it is surprisingly thin. Most research in this area has been conducted using small samples of individuals or only with high school or college students. The most recent study that polled a representative sample of American adults on the topic was published way back in 1965.

The authors, Patrick Heck and Christopher Chabris, worked with a third colleague. 

..[W]e conducted two surveys: one using traditional telephone-polling methods, the other using internet research volunteers. Altogether we asked a combined representative sample of 2,821 Americans whether they agreed or disagreed with the simple statement "I am more intelligent than the average person." 

Here are some of the results:

We found that more than 50% of every subgroup of people -- young and old, white and nonwhite, male and female -- agreed that they are smarter than average. Perhaps unsurprisingly, more men exhibited overconfidence (71% said they were smarter than average) than women (only 59% agreed).

Perhaps "overconfidence" is really accuracy? Consider this pattern of results: 

In our study, confidence increased with education: 73% of people with a graduate degree agreed that they are smarter than average, compared with 71% of college graduates, 62% of people with "some college" experience and just 52% of people who never attended college.

The accessible Wall Street Journal summary is paywalled, but the original empirical publication is open-access in PLOS One.

Questions

a) What kind of study was this? Survey/poll? Correlational? Experimental? What are its key variables? 

b) The authors found that more than 50% of every subgroup of people considered themselves smarter than average. Why is this result a sign of overconfidence? 

c) The authors of this piece state that their combined sample was "representative". Re-read the section on how they got their sample and then make your own assessment--is the sample representative? (i.e., how is its external validity?). What population of interest do they intend to represent? 

d) Sketch a graph of this result: 

73% of people with a graduate degree agreed that they are smarter than average, compared with 71% of college graduates, 62% of people with "some college" experience and just 52% of people who never attended college.

e) In concluding their article, the authors wrote, "Our study shows that many people think they are smarter than they really are, but they may not be stupid to think so." What do you think? To what extent does this study's results support this conclusion? 

e) Ask a question about this study's construct, internal, external, and statistical validity. 

Concentration blunts your sense of smell

$
0
0
Shutterstock_140151007
He is concentrating so much that he might not notice the coffee smell.... Photo: baranq/Shutterstock


At times, we've all been so engrossed in a task that we've lost awareness of our surroundings. Maybe you didn't hear someone calling your name when you were finishing your paper, or maybe you missed the oven timer when you were reading that mystery book.  Now researchers Sophie Forester and Charles Spence have reported that concentration impacts our sense of smell. Here's how the research was described on the APS website:

They set up a room to be distinctively aromatic, hiding three small containers of coffee beans around the room overnight. Over the course of two experiments, they led 40 college students into the room one at a time to perform a tough visual-search task on a computer, finding the letter “X” or “N” in a circle of similar-looking letters (“W,” “M,” “K,” “H,” “Z,” and “V”). 40 other students completed an easier version of the same task; searching for the letter “X” or “N” among a circle of lowercase “o”s. [Students had been randomly assigned to either the difficult or easy task.]

The experimenters then took the students into another room and asked them some follow up questions that grew increasingly leading :

  1. “Describe the room you just completed the task in. Try to describe it using all of your senses.”
  2. “Did you notice any odors in the room, if so what?”
  3. “Could you smell coffee in the room?”

Students assigned to the difficult search task were far less likely to report having picked up the aroma (25% of participants said they noticed a coffee smell) compared to the participants assigned to the easy task (60%-70% percent of participants). When the experimenters led the students back into the test room, all of them said they could smell it. Some of them even commented that the room smelled like a cafe.

Questions

  1. What kind of study was this--experimental or correlational? How do you know? 
  2. What was the independent variable? What was the dependent variable? 
  3. Think about construct validity: What do you think of the way they measured their dependent variable? Is this a good measure?
  4. Now think about statistical validity: How large does this effect seem to be? Take another look at the results and make a comment on the practical effect size.
  5. What about external validity? To whom might these results generalize? Do you think the pattern for coffee and letter detection might generalize to other smells? To other tasks?
  6. Now consider internal validity. The authors claim that it was concentration that caused people to not notice the smell. Can you think of any confounds in this design? 

 


A test of dog empathy

$
0
0
Week 173_GettyImages-178279865
Which of the two dependent variables showed that dogs were affected by their owners' sadness? Photo: adogslifephoto/istock/Getty Images Plus

Here's a short video with footage from an experiment that tested dogs' empathy. Watch the video with these questions in mind.

a) The study was an experiment. What was the independent variable (IV)? 

b) Was the IV manipulated as between-groups (independent groups) or within-groups? What keywords in the video description helped you answer this question? 

c) The dependent variable (DV) was operationalized in two ways. What were they? 

d) One of the DVs did not support the hypothesis, but the other DV did. Explain the results that they found. (You can sketch two little bar graphs, too.)

e) What do you think--does "opening a door to release a crying owner" indicate "empathy?" (P.S., That's a construct validity question.)

f) Does the study support the claim that "hearing their owners ask for help while crying causes dogs to help their owners faster"?  Apply the three causal criteria to support your answer. 

 

Suggested answers:

a) The IV was whether the owners were saying "help" while pretending to cry, or saying "help" in a neutral tone, while humming Twinkle Twinkle Little Star.

b) This was independent-groups--the reporter used keywords such as "some owners" and "other owners". This was a posttest-only design. 

c) One operationalization of the DV was whether the dogs opened the door, or not. The other operationalization of the DV was how long each dog took to open the door.

d) Only the "time taken" DV showed the predicted effect.

e) Answers will vary

f)  The results show covariance because dogs whose owners were crying opened the door three times as fast as dogs whose owners were calm. Temporal precedence is ensured by the methodology--by randomly assigning owners to cry (vs. sing), they ensured that this condition came before opening the door. The study would have good internal validity if they randomly assigned owner/dog pairs to the two conditions--this would take care of selection threats such as having more "already helpful" dogs in the crying condition, or having owners who are better at acting in one condition or the other. As far as design confounds, we might ask about whether the owners in the two conditions acted exactly the same in all ways except their assigned conditions. 

Does loud music make you eat bad food?

$
0
0
GettyImages-912175184 (1)
That sounds delicious, thank you! Photo credit: Nathan Motoyama/EyeEm/Getty Images

June is food month on the ERM blog! This story's about the headline, Loud music makes you eat bad food?!?!?Whaaaaaaatever!

QRock 100.7 was one of several news outlets that had fun describing this study for its readers. 

Let's find out what kind of study was conducted to test the claim. The Q100.7 journalist wrote:  

A new study found loud music makes us more likely to order unhealthy food when we’re dining out.  A new study in Sweden found loud music in restaurants makes us more likely to choose unhealthy menu options.  And we’re more likely to go with something healthy like a salad when the music ISN’T so loud.

Researchers went to a café and played music at different decibel levels to see how it affected what people ordered.  Either 55 decibels, which is like background chatter or the hum from a refrigerator . . . or 70 decibels, which is closer to a vacuum cleaner.

And when they cranked it to up 70, people were 20% more likely to order something unhealthy, like a burger and fries.

They did it over the course of several days and kept getting the same results.  So the study seems pretty legit.

a) OK, go: What seems to be the independent variable in this study? What were its levels? How was it operationalized?

b)  What seems to be the dependent variable? How was it operationalized? Think specifically about how they might have operationalized the concept "unhealthy."

c) Do you think this study counts as an experiment or a quasi-experiment? Explain your answer. 

d) This study can be called a "field study" or perhaps a "field experiment". Why?

e) To what extent can this study support the claim that loud music makes you eat bad food? Apply covariance, temporal precedence, and internal validity to your response.

f) If you were manipulating the loudness of the music for a study like this, how might you do so in order to ensure it was the music, and not other restaurant factors, were responsible for the increase in ordering "unhealthy" food?

g) The Q100.7 journalist argues that the study seems "pretty legit." What do you think the journalist meant by this phrase? 

h) The study on food and music volume are summarized in an open-access conference abstract, published here. You might be surprised to read, contrary to the journalist's report, that the field study was conducted on only two days--with one day at 50db and the other at 70 db. How does this change your thoughts about the study? 

g) Conference presentations are not quite the same as peer-reviewed journal publications. Take a moment (and use your PSYCinfo skills) to decide if the authors, Biswas, Lund, and Szocs, have published this work yet in a peer reviewed journal.  Why might journalists choose to cover a story that has only been presented at a conference instead of peer-reviewed? Is this a good practice in general? 

 

 

Delay of gratification (marshmallow study replication)

$
0
0
GettyImages-108272760
Her school achievement later in life can be predicted from her ability to wait for a treat (or by her family's SES). Photo: Manley099/Getty Images

There's a new replication study about the famous "marshmallow study",  and it's all over the popular press. You've probably heard of the original research: Kids are asked to sit alone in a room with a single marshmallow (or some other treat they like, such as pretzels). If the child can wait for up to 15 minutes until the experimenter comes back, they receive two marshmallows. But if they eat the first one early, they don't. As part of the original study, kids were tracked over several years. One of the key findings was that the longer children were able to wait at age 4, the better they were doing in school as teenagers. Psychologists have often used this study as an illustration of how self-control is related to important life outcomes. 

The press coverage of this year's replication study illustrates at least two things. First, it's a nice example of multiple regression.  Second, it's an example of how different media outlets assign catchy--but sometimes erroneous--headlines on the same study.

First, let's talk about the multiple regression piece. Regression analyses often try to understand a core bivariate relationship more fully. In this case, the core relationship they start with is between the two variables, "length of time kids waited at age 4" and "test performance at age 15." Here's how it was described by Payne and Sheeran in the online magazine Behavioral Scientist:

The result? Kids who resisted temptation longer on the marshmallow test had higher achievement later in life. The correlation was in the same direction as in Mischel’s early study. It was statistically significant, like the original study. The correlation was somewhat smaller, and this smaller association is probably the more accurate estimate, because the sample size in the new study was larger than the original. Still, this finding says that observing a child for seven minutes with candy can tell you something remarkable about how well the child is likely to do in high school. 

a) Sketch a well-labelled scatterplot of the relationship described above. What direction will the dots slope? Will they be fairly tight to a straight line, or spread out?

b) The writers (Payne and Sheeran) suggest that a larger sample size leads to a more accurate estimate of a correlation. Can you explain why a large sample size might give a more accurate statistical estimate? (Hint: Chapter 8 talks about outliers and sample size--see Figures 8.10 and 8.11.)

Now here's more about the study: 

The researchers next added a series of “control variables” using regression analysis. This statistical technique removes whatever factors the control variables and the marshmallow test have in common. These controls included measures of the child’s socioeconomic status, intelligence, personality, and behavior problems. As more and more factors were controlled for, the association between marshmallow waiting and academic achievement as a teenager became nonsignificant. 

c) What's proposed above is that social class is a third variable ("C") that might be associated with both waiting time ("A") and school achievement ("B"). Using Figure 8.15. draw this proposal. Think about it, too: Why does it make sense that lower SES might go both with lower waiting time (A)? Why might lower SES go with lower school achievement (B)? 

d) Now create a mockup regression table that might fit the pattern of results being described above. Put the DV at the top (what is the DV?), then list the predictor variables underneath, starting with Waiting time at Age 4, and including things like Child's Socioeconomic Status and Intelligence. Which betas should be significant? Which should not?

Basically, here we have a core bivariate relationship (between wait time and later achievement), and then a critic suggests a possible third variable (SES). They used regression to see if the core relationship was still there when the third variable was controlled for. The core relationship went away, suggesting that SES was a third variable that can help explain why kids who wait longer do better in school later on. 

Next let's talk about some of the hype around this replication study. The Behavioral Scientist piece (quoted above) is one of the more balanced descriptions. Its headline was, Try to Resist Misinterpreting the Marshmallow Test. It emphasized that the core relationship was replicated. It also explains in some detail why SES is related to self-control, and how the two probably cannot be meaningfully separated--it's a nuanced report. But other press coverage had a doomsday feel:

Vox's subtitle read:
The famous psychology test gets roasted in the new era of replication.

The Guardian blared:
Famed impulse control 'marshmallow test' fails in new research

And the Quartz website said:
We learned the wrong lesson about self-control from the famous marshmallow test

One person on Twitter even wrote,
"The marshmallow/delayed gratification study always felt "wrong" to me - this year it was reported to be hopelessly flawed"

Are these headlines and comments fair? Probably not. As Payne and Sheeran write in Behavioral Scientist

The problem is that scholars have known for decades that affluence and poverty shape the ability to delay gratification. Writing in 1974, Mischel observed that waiting for the larger reward was not only a trait of the individual but also depended on people’s expectancies and experience. If researchers were unreliable in their promise to return with two marshmallows, anyone would soon learn to seize the moment and eat the treat. He illustrated this with an example of lower-class black residents in Trinidad who fared poorly on the test when it was administered by white people, who had a history of breaking their promises. Following this logic, multiple studies over the years have confirmed that people living in poverty or who experience chaotic futures tend to prefer the sure thing now over waiting for a larger reward that might never come. But if this has been known for years, where is the replication crisis?

e) Give at least 2 reasons why online sources might use incorrect headlines when they cover psychological science. 

f) Check out at least two different press stories about the marshmallow replication effect, and see how similar they are to each other and to the summary you analyzed above, by Payne and Sheeran. 

You can read the published paper for yourself! The full replication paper is in Psychological Science and is open access

Social media effect sizes

$
0
0
1.2
Is social media use responsible for depressed mood? Photo: Ian Allenden/Alamy stock

Do smartphones harm teenagers? If so, how much? In this blog, I've written before about the quasi-experimental and correlational designs used in research on screen time and well-being in teenagers. In that post you can practice identifying the different designs we can use to study this question. 

Today's topic is more about the size of the effect in studies that have been published. A recent Wired story tried to put the effect size in perspective. 

One side of the argument, as presented by Robbie Gonzalez in Wired, scares us into seeing social media as dangerous.

For example, first

...there were the books. Well-publicized. Scary-sounding. Several, really, but two in particular. The first, Irresistible: The Rise of Addictive Technology and the Business of Keeping Us Hooked, by NYU psychologist Adam Alter, was released March 2, 2017. The second, iGen: Why Today's Super-Connected Kids are Growing Up Less Rebellious, More Tolerant, Less Happy – and Completely Unprepared for Adulthood – and What That Means for the Rest of Us, by San Diego State University psychologist Jean Twenge, hit stores five months later.

In addition,

...Former employees and executives from companies like Facebook worried openly to the media about the monsters they helped create. 

But is worry over phone use warranted? Here's what Gonzalez wrote after talking to more researchers:

When Twenge and her colleagues analyzed data from two nationally representative surveys of hundreds of thousands of kids, they calculated that social media exposure could explain 0.36 percent of the covariance for depressive symptoms in girls.

But those results didn’t hold for the boys in the dataset. What's more, that 0.36 percent means that 99.64 percent of the group’s depressive symptoms had nothing to do with social media use. Przybylski puts it another way: "I have the data set they used open in front of me, and I submit to you that, based on that same data set, eating potatoes has the exact same negative effect on depression. That the negative impact of listening to music is 13 times larger than the effect of social media."

In datasets as large as these, it's easy for weak correlational signals to emerge from the noise. And a correlation tells us nothing about whether new-media screen time actually causes sadness or depression. 

 There are several things to notice in the extended quote above. First let's unpack what it means to, "explain 0.36% of the covariance". Sometimes researchers will square the correlation coefficient r  to create the value R2. The R2  tells you the percentage of variance explained in one variable by the other (incidentally, they usually say "percent of the variance" instead of "percent of covariance."). In this case, it tells you how much of the variance in depressive symptoms is explained by social media time (and by elimination, it tells you what percentage is attributable to something else). We can take the square root of 0.0036 (that's the percentage version of 0.36%) to get the original r between depressive symptoms and social media use. It's r = .06. 

Questions

a) Based on the guidelines you learned in Chapter 8, is an r of .06 small, medium, or large? 

b) Przybylski claims that the effect of social media use on depression is the same size as eating potatoes. On what data might he be basing this claim? Illustrate your answer with two well-labelled scatterplots, one for social media and the other for potatoes. Now add a third scatterplot, showing listening to music. 

c) When Przybylski states that the correlation held for the girls, but not the boys, what kind of model is that? (Here are your choices: moderation, mediation, or a third variable problem?)

d) Finally, Przybylski notes that in large data sets, it's easy for weak correlation signals to appear from the noise. What statistical concepts are being applied here? 

e) Chapter 8 presents another example of a large data set that found a weak (but statistically significant) correlation. What is it? 

f) The discussion above between Gonzalez and Przybylski concerns which of the four big validities? 

g) Finally, Przybylski mentions that "a correlation tells us nothing about whether new-media screen time actually causes sadness or depression".  Why not? 

 

Suggested answers:

a) An r of .06 is probably going to be characterized as "small" or "very small" or even "trivial."  That's what the "potatoes" point is trying to illustrate, in a more concrete way. 

b) One scatterplot should be labeled with "potato eating" on the x axis and "depression symptoms" on the y axis. The second scatterplot should be labeled with "social media use" on the x axis and "depression symptoms" on the y axis. These first two plots should show a positive slope of points with the points very spread out--to indicate the weakness of the association. The spread of the first two scatterplots should be almost the same, to represent the claim the two relationships are equal in magnitude. The third scatterplot should be labeled with "listening to music" on the x axis and "depression symptoms" on the y axis, and this plot should show a much stronger, positive correlation (a tighter cloud of points).

c) It is a moderator. Gender moderates (changes) the relationship between screen use and depression. 

d) Very large data sets have a lot of statistical power. Therefore, large data sets can show statistical significance for even very, very, small correlations--even correlations that are not of much practical interest. A researcher might report a "statistically significant' correlation, but it's essential to also ask about the effect size and its practical value (the potatoes argument).  Note: you can see the r = .06 value in the original empirical article here, on p. 9. 

e) The example in Chapter 8 is the one about meeting one's spouse online and having a happier marriage--that was a statistically significant relationship, but r was only .03. That didn't stop the media from hyping it up, however

f) Statistical validity

g) The research on smartphones and depressive symptoms is correlational, making causal claims (and causal language) inappropriate. That means that we can't be sure if social media is leading to the (slight) increase in depressive symptoms, or if people who have more depressive symptoms end up using more social media, or if there's some third variable responsible for both social media use and depressive symptoms. As the Wired article states, 

...research on the link between technology and wellbeing, attention, and addiction finds itself in need of similar initiatives. They need randomized controlled trials, to establish stronger correlations between the architecture of our interfaces and their impacts; and funding for long-term, rigorously performed research.

Finally, the Wired article quotes (the seemingly skeptical) Pyrbylski as saying, 

"Don't get me wrong, I'm concerned about the effects of technology. That's why I spend so much of my time trying to do the science well," Przybylski says.

Good science is the best way to answer our questions. 

 

Can greening up a vacant lot ease depression in a community?

$
0
0
Week 169_shutterstock_724124668
The study randomly assigned lots like this one to be cleaned up, turned into green space, or left alone. Photo: 1000 Words/Shutterstock

When I first saw the headline, "Replacing Vacant Lots With Green Spaces Can Ease Depression In Urban Communities" I thought it was just another journalist putting a causal claim on a correlational study. So I was surprised to read this statement about the design:

[Researcher Eugenia South] and her colleagues wanted to see if the simple task of cleaning and greening these empty lots could have an impact on residents' mental health and well-being. So, they randomly selected 541 vacant lots [in the city of Philadelphia] and divided them into three groups.

They collaborated with the Pennsylvania Horticultural Society for the cleanup work.

The lots in one group were left untouched — this was the control group. The Pennsylvania Horticultural Society cleaned up the lots in a second group, removing the trash. And for a third group, they cleaned up the trash and existing vegetation, and planted new grass and trees. The researchers called this third set the "vacant lot greening" intervention.

Here's more:

The team surveyed residents living near the lots before and after their trial to assess their mental health and wellbeing. "We used a psychological distress scale that asked people how often they felt nervous, hopeless, depressed, restless, worthless and that everything was an effort," explains South.

The scale alone doesn't diagnose people with mental illness, but a score of 13 or higher suggests a higher prevalence of mental illness in the community, she says.

People living near the newly greened lots felt better. "We found a significant reduction in the amount of people who were feeling depressed," says South.

As one commentator noted:

Previous research has shown that green spaces are associated with better mental health, but this study is "innovative," says Rachel Morello-Frosch, a professor at the department of environmental science, policy and management at the University of California, Berkeley, who wasn't involved in the research.

"To my knowledge, this is the first intervention to test — like you would in a drug trial — by randomly allocating a treatment to see what you see," adds Morello-Frosch. 

Questions

a) How do we know that this is an experiment and not a correlational study?

b) What were the Independent and Dependent variables? 

c) What was the design: Posttest only? Pretest-posttest? Repeated measures? Or concurrent measures?

d) Sketch a graph of the results described above. 

e) Ask at least one question each about this study's construct, internal, external, and statistical validities.

f) Because this article was published in the open-access journal JAMA Network Open anyone can read the paper. Take a look carefully at the tables in the paper (especially Table 2). How strong do the results seem to you? Are the differences between the conditions large? Do you see improvements on all the measured variables, or just on a few of them? 

g) Why does the design of this study help support the causal claim that "Replacing vacant lots with green spaces can ease depression...."? (Apply the three causal criteria.) 

 

Viewing all 275 articles
Browse latest View live