دانلود رایگان مقاله: مشاهده غیر ستاره: (بعضی) منابع تعصب در رویکردهای ابهام زایی گذشته و ابزار عمومی جدید استفاده از سوابق برچسب

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
10482875	934302	2015	30 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Seeing the non-stars: (Some) sources of bias in past disambiguation approaches and a new public tool leveraging labeled records

ترجمه فارسی عنوان

مشاهده غیر ستاره: (بعضی) منابع تعصب در رویکردهای ابهام زایی گذشته و ابزار عمومی جدید استفاده از سوابق برچسب

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

پیوند ضبط، بی نظمی، اختراعات، نظارت بر یادگیری، جنگل های تصادفی،

Patents - اختراع Disambiguation - بی نظمی Random forests - جنگ های تصادفی Supervised learning - نظارت بر یادگیری Record Linkage - پیوند ضبط

موضوعات مرتبط

علوم انسانی و اجتماعی مدیریت، کسب و کار و حسابداری کسب و کار و مدیریت بین المللی

پیش نمایش مقاله

مشاهده غیر ستاره: (بعضی) منابع تعصب در رویکردهای ابهام زایی گذشته و ابزار عمومی جدید استفاده از سوابق برچسب

چکیده انگلیسی

To date, methods used to disambiguate inventors in the United States Patent and Trademark Office (USPTO) database have been rule- and threshold-based (requiring and leveraging expert knowledge) or semi-supervised algorithms trained on statistically generated artificial labels. Using a large, hand-disambiguated set of 98,762 labeled USPTO inventor records from the field of optoelectronics consisting of four sub-samples of inventors with varying characteristics (Akinsanmi et al., 2014) and a second large, hand-disambiguated set of 53,378 labeled inventor records corresponding to a subset of academics in the life sciences (Azoulay et al., 2012), we provide the first supervised learning approach for USPTO inventor disambiguation. Using these two sets of inventor records, we also provide extensive evaluations of both our algorithm and three examples of prior approaches to USPTO disambiguation arguably representative of the range of approaches used to-date. We show that the three past disambiguation algorithms we evaluate demonstrate biases depending on the feature distribution of the target disambiguation population. Both the rule- and threshold-based methods and the semi-supervised approach perform poorly (10-22% false negative error rates) on a random sample of optoelectronics inventors - arguably the closest of our sub-samples to what might be expected of the majority of inventors in the USPTO (based on disambiguation-relevant metrics). The supervised learning approach, using random forests and trained on our labeled optoelectronics dataset, consistently maintains error rates below 3% across all of our available samples. We make public both our labeled optoelectronics inventor records and our code to build supervised learning models and disambiguate inventors (see http://www.cmu.edu/epp/disambiguation). Our code also allows users to implement supervised learning approaches with their own representative labeled training data.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Research Policy - Volume 44, Issue 9, November 2015, Pages 1672-1701

نویسندگان

Samuel L. Ventura, Rebecca Nugent, Erica R.H. Fuchs,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : مشاهده غیر ستاره: (بعضی) منابع تعصب در رویکردهای ابهام زایی گذشته و ابزار عمومی جدید استفاده از سوابق برچسب

دسترسی سریع

ارتباط

English Website