Data Visualization: Creating graphs for your data

Univariate Graph

Graphed data set H1SE4: Teens perceived intelligence compared to peers

The graph is not symmetrical with one modal, with the most responses to "average" (2502) . It skews left to slightly.

Respondents seem to be pretty confident in their relative intelligence. 



data['H1SE4']=data['H1SE4'].cat.rename_categories(["Mod. Below Avg", "Slightly Below Avg", "Avg", "Slightly Above Avg", "Mod Above Avg", "Extremely Above Avg"])

seaborn.countplot(x="H1SE4", data=data)
plt.xlabel('Self-Percieved Intelligence')
plt.title('Intelligence Perception Compared to Peers')

#Univairate historgram
seaborn.distplot(data["H1SE4"].dropna(), kde=False);
plt.xlabel('Self-Percieved Intelligence')
plt.title('Intelligence Perception Compared to Peers')

print("counts of self-percieved intelligence - missing nan")
desc1 = data['H1SE4'].describe()

#Bivariate graph

seaborn.factorplot(x="H1SE4", y="H1CO1", data=data, kind="bar", ci=None)
plt.xlabel('Self-Percieved Intelligence')
plt.ylabel('No. Sexually Active Young Adults')
plt.title ('Sexual Activity v. Perceived Intelligence')

Bivariate Graph  

Graphed data set H1SE4: Teens perceived intelligence compared to peers vs. H1C01: Teens who are sexually active


H1SE4-v-H1C01 copy.png

This was a really exciting graph to see, because it supports my hypothesis that self-perception/confidence and sexual activity are related. While this isn't exactly what I thought it would look like, its pretty close. 

I believed that individuals with lower self-confidence would be more likely to be sexually active than others; this graph proves that because it skews right. 

However, I am slightly surprised to see the slight up tick in sexual activity amount the "Extremely Above Average" category. The curve swings back upwards, when I would have thought that it would have continued to decrease. 

I would be interested to compare this graph with gender as well to see the comparison. 



Data Management and Visualization: Data Management


# Import data set libraries
import pandas
import numpy

# Import my data set
data = pandas.read_csv('addhealth_pds.csv', low_memory=False)

# bug fix for display formats to avoud run time errors
pandas.set_option('display.float_format', lambda x:'%f'%x)

data['H1SE2'] = data['H1SE2'].convert_objects(convert_numeric=True)

print("counts of plan ahead for birth control use H1SE2 1=very sure")
c5 = data['H1SE2'].value_counts(sort=False)
print (c5)

print("percentage plan ahead for birth control use H1SE2 1=very sure")
p5 = data['H1SE2'].value_counts(sort=False, normalize=True)
print (p5)

# recode missing values to python missing (NaN)
data['H1SE2']=data['H1SE2'].replace(96, numpy.nan)
data['H1SE2']=data['H1SE2'].replace(97, numpy.nan)
data['H1SE2']=data['H1SE2'].replace(99, numpy.nan)

print("percentage plan ahead for birth control use - missing nan")
p5 = data['H1SE2'].value_counts(sort=False, dropna=False)
print (p5)


print("counts of resist sex if partner doesnt want bc H1SE3 1=very sure")
c6 = data["H1SE3"].value_counts(sort=False)
print (c6)

print("percentage resist sex if partner doesnt want bc H1SE3 1=very sure")
p6 = data["H1SE3"].value_counts(sort=False, normalize=True)
print (p6)

# recode missing values to python missing (NaN)
data['H1SE4']=data['H1SE4'].replace(96, numpy.nan)
data['H1SE4']=data['H1SE4'].replace(98, numpy.nan)

print("counts of resist sex if partner doesnt want bc - missing nan")
p6 = data['H1SE3'].value_counts(sort=False, dropna=False)
print (p6)

This week I focused on coding out missing data.

I faced some roadblocks trying to do some of the more complicated data management code snippets, so I'd like to revisit this lesson and try again. 

The nan values of each variable: 

Name: H1SE2, percentage plan ahead for birth control use
nan          2060

Name: H1SE3, counts of resist sex if partner doesnt want birth control
nan          2055

Name: H1SE4, counts of self-percieved intelligence
nan             5


The first two variables deal with specific questions about their safe sex practices, or what they would think they could do. Both had about the same number respond: refuse (96), legitimate skip (97), or not applicable (99). 

What's interesting is the third variable asks the subject to rate their general intelligence compared to their peers, and only 30 people opted out of answering (refusal [96] or don't know [98]). 

Conclusion: Teens are much more willing to rate their overall intelligence, but when it comes to specific topical questions, they are less confident to answer. 

Note: the last variable did not include the Legitimate Skip answer response. I am not quite sure what qualifies as a legitimate skip for these answers (I assume it's under 15 yrs old). I might need to re-do this comparison, eliminating those legitimate skip responses from the first two variables. 

Data Management and Visualization: Research Question

This post is a part of an online course through Wesleyan University on called Data Management and Visualization. I have decided to enroll in this course to learn more about data & data visualization, with the hopes of incorporating data analysis into my future career. 

After looking through the AddHealth Codebook, I want to focus on safe sex among teenagers. Having done previous qualitative research in this area of sex education in the U.S., I want to expand upon it with more quantitative data.

Looking through the questionnaire, I was a bit overwhelmed by the possible directions I could follow in exploring this topic. Ultimately, I have decided to investigate the correlations between the frequency of (safe) sex among teenagers and their self-perception – the teens’ confidence in themselves and their knowledge of safe sex practices.

Some my questions surrounding this topic are:

  1. Do they feel confident and capable of having safe sex?
  2. Do they feel confident in their knowledge of safe sex practices? 
  3. Do they feel pressure to have unsafe sex from their own self, friends/family, or the media?


My research revealed some previous studies around this topic; some of which were conflicting.  

A Kaiser Family Foundation (1) found that actual peer pressure has very little to do with kids having sex for the first time; it is often what kids think their friends are doing that is more persuasive.  

“50% of young people who had sex at age 17 or younger, state that they felt pressure to become sexually active because of other people’s rumored sexual activity” - 2014, The National Campaign to Prevent Teen and Unplanned Pregnancy (2). 

However, an Indiana University School of Medicine study in 2002 (3), found that self-esteem played the most important role in dictating when and by whom sexual intercourse is initiated by kids. They found that boys and girls reacted inversely to their confidence – girls with higher self esteem would wait to have sex, whereas boys of the same age would engage in sex earlier. This is possibly related to preconceived gender roles in relation to sex in Western society. I also read one study that said, "sexual activity or virginity was not related to self-esteem in either males or females" (4). There is obviously a lot to explore here. 

Also, The Guttemacher Institute (5) has done quite a bit of research into the correlation between social perception and a teenager's first sexual experience. I also would find it interesting to know how partner-to-partner communication (6) impacts safe sex practices for teens. 

Hypothesis: I believe that teen sex and safe sex practices have a direct relationship to teens’ confidence, both in themselves and any knowledge they have. Teens that have higher confidence in themselves and in safe sex practices will be more likely to use protection when having sex. 



(1) Allen, Colin. "Peer Pressure and Teen Sex." Psychology Today. Psycology Today, 1 May 2003. Web. 27 Dec. 2015.

(3) "Early Intercourse and Self-esteem Linked in Adolescent Behavior."EurekAlert! INDIANA UNIVERSITY, 30 Apr. 2002. Web. 26 Dec. 2015.

(2) MI Science Department Staff, and Freda Bush, Medica. "Virginity Revisited | Medical Institute for Sexual Health." Medical Institute for Sexual Health RSS. Medical Institute for Sexual Health, 11 June 2015. Web. 29 Dec. 2015.

(4) Robinson, RB, and DI Frank. "The Relation between Self-esteem, Sexual Activity, and Pregnancy." National Center for Biotechnology Information. Florida A&M University, 1994. Web. 25 Dec. 2015.

(5) Sieving, Renee E., Marla E. Eisenberg, Sandra Pettingell, and Carol Skay. "Friends' Influence on Adolescents' First Sexual Intercourse."Perspectives on Sexual and Reproductive Health 38.1 (2006): n. pag.Guttmacher Institute. Web. 25 Dec. 2015.

(6) Whitaker, Daniel J., Kim S. Miller, David C. May, and Martin L. Levin. "Teenage Partners' Communication About Sexual Risk and Condom Use: The Importance of Parent-Teenager Discussions." Family Planning Perspectives 31.3 (1999): 117. Web.