Best source for US given name popularities? Project help needed
Hi All,I'm doing a statistics project for school and I'll be comparing the popularities of given names throughout a full year in the nation, my home state and my region. The latter data I'm getting from birth announcements from the local paper, and I was thinking of using the US Social Security data for the national and state information. This seems to be reliable even if it's only based on 1% of the total births. What I don't like about it is that it seperates different spellings of the same name(Lily, Lilly, etc) as different names and ranks them accordingly, so there's an error involved based on spelling. But I could use the same guidelines on the local info just to keep everything equal, even though I'd rather not. It looks like the most recent data I'll be able to use is from 2006 as they don't issue the yearly popularity info until Mother's Day in May. I couldn't find much on the US census page to help out. Any other ideas, or do you think this is the best solution?
Tempestgirl
vote up1vote down

Replies

The SSA website gives you the option of seeing the actual number of births for each name, so you can combine different spellings of the name yourself if you wish. 6,671 Lily + 2,239 Lilly + 635 Lillie = 9,545.
If you combine all the spellings of Aiden, Abigail, etc. then you can refigure the rankings.
vote up1vote down
This site takes the SSA data and groups the names by spelling. Few different spellings are mistakes, though, so it depends on your purposes:http://namenerds.com/uucn/pop.html
vote up1vote down
I think you are wrong about the SSA data being based on "1% of total births". The "Background Information" page on their site now says:All names are from Social Security card applications for births that occurred in the United States after 1879. Names are restricted to cases where the year of birth, sex, State of birth (50 States and District of Columbia) are on record, and where the given name is at least 2 characters long. Many people born before 1937 never applied for a Social Security card, so their names are not included in our data. For others who did apply, our records may not show the place of birth, and again their names are not included in our data.All data are from a 100% sample of our records on Social Security card applications as of the end of February 2007.
vote up1vote down
I had read the same statement you posted, but then came across "Actuarial Note #139, Name Distributions in the Social Security Area, August 1997", found at the bottom of http://www.socialsecurity.gov/OACT/babynames/index.html#forms , which states:Between 1954 and 1984 the Social Security Administration occasionally published a listing of surnames and their popularity in the Report of Distribution of Surnames in the Social Security Number File. This note expands on that project by presenting the most popular given names. The source file is a one percent sample of Social Security Number card applications. ...This file is not limited to persons born in the United States but is representative of all Social Security Number card holders. For purposes of this document, names have not been edited or grouped together according to spelling variations of the same name. People quoting from this document are urged to explicitly acknowledge this qualification.
This was published in the summer of 1998, and I wasn't sure if it related to that year or the survey as a whole. It is misleading having both criteria listed, and I haven't been successful contacting anyone regarding it. Obviously, popularity research is not their first priority and they mention that, but it'd be nice to have a clear answer. How would you take this information? I'm now inclined to believe that the sample does indeed use 100% of SS applicants as the website was updated May of last year and I'd hope the information would be up to date and correct.What are your thoughts? Thanks for your assistance,
Tempestgirl
vote up1vote down
Sorry I didn't see this until today.I believe that back when the SSA site did not have year-by-year lists before 1990, but only had decade by decade lists for the 1880s, 1890s, 1900s, 1910s, etc., that those lists were based on a 1% sample of all people who had birthdates listed in those decades. But I believe that when the SSA expanded its site to include lists for every individual year from 1880 on, that it created those lists from the full data set. I think that if they were still using just a 1% sample, there would have to be more ties in the numbers the report for the names on their year by year lists than there actually are. :)
vote up1vote down
Thanks!(
vote up1vote down