App Store Logo

Still Life

Rally in the Valley excites fans

Rally in the Valley excites fans

November 6, 2009

Students capture fall at University Park

Students capture fall at University Park

November 5, 2009

Penn State Greeks strut their Broadway stuff

Penn State Greeks strut their Broadway stuff

November 1, 2009

THON 5K draws thousands

THON 5K draws thousands

November 1, 2009

Jazz masters wow audience

Jazz masters wow audience

October 28, 2009

Arboretum boardwalk and overlook chosen as 2010 senior class gift

Arboretum boardwalk and overlook chosen as 2010 senior class gift

October 27, 2009

Outreach mission brings jazz legends to high school musicians

Outreach mission brings jazz legends to high school musicians

October 27, 2009

Penn State Altoona celebrates 70th anniversary

Penn State Altoona celebrates 70th anniversary

October 27, 2009

Campus Night Out

Campus Night Out

October 22, 2009

Photography students play with light, shadow

Photography students play with light, shadow

October 20, 2009

Homecoming 2009

Homecoming 2009

October 17, 2009

Weather not a factor in Homecoming enthusiasm

Weather not a factor in Homecoming enthusiasm

October 16, 2009

Featured Video

2009 State of the University Address

2009 State of the University Address

Penn State Solar Decathlon 2009, part two: Natural Fusion goes to Washington

Penn State Solar Decathlon 2009, part two: Natural Fusion goes to Washington

Natural Fusion, Penn State's Solar Decathlon Team 2009

Natural Fusion, Penn State's Solar Decathlon Team 2009

Behind the scenes with the stadium concessions team

Behind the scenes with the stadium concessions team

Penn State's creamery, from the cow to the cone

Penn State's creamery, from the cow to the cone

Beaver Stadium Behind the Scenes and On the Air

Beaver Stadium Behind the Scenes and On the Air

Beaver Stadium Behind the Scenes: Video Board

Beaver Stadium Behind the Scenes: Video Board

Video gives students sneak peek at new campus location

Video gives students sneak peek at new campus location

Historic Old Main Bell removed from tower for restoration and display

Historic Old Main Bell removed from tower for restoration and display

New system solves the 'who is J. Smith' puzzle

Thursday, December 14, 2006

University Park, Pa. -- Penn State researchers have developed an automated system that can determine which "J. Smith" is authoring papers on computer science -- the one who teaches at Penn State or the one who teaches at M.I.T -- as well as whether "J. Smith" is John Smith, Jane Smith, Joanna L. Smith or James H. Smith.

The system, which retrieves classes of authors with similar names, considers not just names in making its determination but also other information such as co-authors, dates of publications, citations and keywords.

When tested with 3,355 academic papers written by 490 authors, the system correctly identified authors 90.6 percent of the time.

"It works very similarly to how humans would figure out authors' identity -- by looking at affiliations, topics, publications," said C. Lee Giles, the David Reese professor of Information Sciences and Technology and principal researcher.

"The system works by using machine-learning methods to cluster together names that the system believes to be similar. If you think there’s another parameter that's relevant, you can change the algorithm and include it," Giles said.

The system is explained in a paper, "Efficient Name Disambiguation for Large-Scale Databases," presented at the recent 17th European Conference on Machine Learning and the 10th European Conference on Principles and Practice of Knowledge Discovery in Databases in Berlin. Co-authors were Jian Huang, a doctoral student in the College of Information Sciences and Technology, and Seyda Ertekin, a doctoral student in the Department of Computer Science and Engineering.

Even in academic publications, figuring out an author's identity can be difficult as publications vary in how individuals' names are presented. For instance, some publications opt just for first initial and last name as in "J. Smith." Others include full name -- C. Lee Giles, for instance. But if the surname is common, as in "Smith" or "Chen," first names may not suffice to accurately identify the author.

Confusion also can occur because of how entities are listed with some publications choosing Penn State, The Pennsylvania State University or PSU. The researchers' algorithm can clear up ambiguities surrounding entities whether institutions, businesses, funding agencies or organizations.

"This method will work on many entity disambiguation problems," Giles said.

The algorithm uses a clustering method to train computers to extract information based on similar properties. Each time information is clustered, the result is a smaller and smaller grouping.

The algorithm will be a part of the next generation CiteSeer, the largest academic search engine for computer and information-science literature. Giles was co-creator of CiteSeer when he was at NEC.

The research was supported by the National Science Foundation and Microsoft.

Contact
Newswires you might enjoy