Answered By: Bobray Bordelon
Last Updated: Apr 12, 2017     Views: 68

If your dataset has less than 1,000,000 rows, you can open it in Excel 2007 or higher.

Here are the steps:

  1. Sort the PERMNO field or the field you want to deduplicate and select the column
  2. From the Menu bar, Data-> Filter-> Advanced Filter
  3. In the Advanced Filter Dialog Box, check the "Filter the list, in-place" button, and check the "Unique records only" checkbox, then OK.

If your dataset has more than 65000 rows, you can open it in Stata.

Here are the steps to deduplicate PERMNO:

  1. Assume you save the .csv file in H: drive.
  2. cd H:
  3. insheet using mydata.csv
  4. sort permno
  5. duplicates drop permno, force
  6. outsheet using deduplicatedpermno.csv, c

Where deduplicatedpermno.csv is the dataset with unique PERMNOs.

Related Topics

Contact Us

Chat with a Librarian

Chat requires JavaScript.

Text a Librarian

Text (609) 277-3245 to get live help on your mobile phone (available the same hours as the Chat service)


Email a Librarian

You can email your research questions to refdesk@princeton.edu or you can request an individual appointment with a subject specialist.


Call a Librarian

Call (609) 258-5964 to speak to a reference librarian during most open hours of the Libraries.