Using Data Extract
Updated: Mar 5, 2012



What Data Extract is:
Data Extract is a tool that extracts over 50 bibliographic fields — as well as the full text — of patent office data and delivers it all to your desktop in file formats suitable for use in many popular applicationsSM, Microsoft's Excel®, ISI ReasearchSoft's EndNote®, Reference Manager®, ProCite® and most standard text editors.

[back to top]



What to use Data Extract for:
Use Data Extract to export patent records from your Work Files or Result Sets for use in other applications.

[back to top]



Overview of the Data Extract control panel:
The following is a high-level overview of the Data Extract control panel:

Screen capture of the Data Extract control panel

[back to top]



How to create a Data Extract:
Initiate the creation of a Data Extract from any query Result Set or any Work File:
  1. In Select format box, indicate the format you want for your extract. Formats are described following.

  2. Choose to Extract either Selected items on this page or All items (up to 20, 000).

  3. If you are choosing to extract Selected Items from this Page, select those items from your Result Set.

  4. If you have chosen one of the CSV or tagged file formats, the Chosen fields box shows a default set of fields based on the fields you have chosen to display on your Result Set — you can remove these fields if you choose.

    In the Available fields box, highlight any fields you want added to your extract and click the blue arrow that points to the right to add them to the Chosen fields box.

    To remove a field from the extract parameters, highlight it in the Chosen fields box and click the blue arrow that points left.

  5. If you have chosen RIS format for EndNote, RefMan or ProCite, the required fields are shown in the Chosen fields box, you cannot remove any of the required fields and there are no additional fields that can be selected.

  6. In the Available fields box, highlight the fields you want added to your extract and click the blue arrow that points to the right to add them to the Chosen fields box.

    To remove an optional field from the extract parameters, highlight it in the Chosen fields box and click the blue arrow that points left.

  7. If you have chosen one of the XML options: in the Available fields box, choose either Bibliographic or Full text with bibliographic.

  8. For CSV and tagged downloads, optionally choose to zip your files for a faster download. Both XML options automatically download zipped.

    For RIS downloads, you can zip your files or choose to directly export into EndNote, Reference Manager or ProCite. To directly export, when you are prompted to either Open or Save your file, choose Open. You will then be asked which program you want to open the data with. Select the program (Endnote, Reference Manager or ProCite) then the integrated Direct Export function completes the process.

  9. Click DOWNLOAD to create Data Extract.

  10. Wizard screens will prompt you to open your Data Extract file directly or save it to a disk.
[back to top]



Data Extract formats available with file samples:
Format Description File Type       
CSV Comma separated values, with multiple occurrences of a field shown all in one field.

In most PC operating systems, .csv files are defaulted to open in Microsoft Excel or another spreadsheet program. If your spreadsheet program does not open automatically, open the spreadsheet program first and then open the .csv file. These files can also opened in a regular text editor.

The fields you requested for your extract will be shown in the order you requested with the data items separated by commas. The field names will be shown once at the top of each column.

A list of all field codes and their tags is shown following.
.csv

Sample file

Graphical example
CSV Multi-Fields/Codes Comma separated value with field codes, with multiple occurrences of a field shown as separate fields.

In most PC operating systems, .csv files are defaulted to open in Microsoft Excel or another spreadsheet program. If your spreadsheet program does not open automatically, open the spreadsheet program first and then open the .csv file. These files can also opened in a regular text editor.

The fields you requested for your extract will be shown in the order you requested with the data items separated by commas. Each field will begin with a field code that represents the patent data shown in that field. Multiple occurrences of a field will be shown as separate fields and the field code will be suffixed with a sequence number (i.e., PA1 will be the first Patent Assignee listed, PA2 will be the second Patent Assignee listed, etc.).

If there are not multiple occurrences in a field that could potentially have multiple occurrences, then the sequence number will still be used but there will only be one occurrence of it (i.e., there will be a PA1 without a PA2).

This format is not recommended for use with the following fields:

  • Abstract
  • First or Exemplary Claim   
  • Independent Claims
  • Other References
  • AssigneeInventor
  • Inventor City/State
  • Attorney Name
  • Field of Search
  • Examiner: Primary
  • Examiner: Assistant

  • A list of all field codes and their tags is shown following.
    .csv

    Sample file

    Graphical example
    Tagged Tagged entries are suitable for import into a tagged file application, with multiple occurrences of a field shown all in one field.

    The fields you requested for your extract will be shown in the order you requested. Each field will begin with a tag that identifies the patent data shown in that field.

    A list of all field codes and their tags is shown following.
    .tag

    Sample file

    Graphical example
    Tagged Multi-Fields Tagged entries suitable for import into a tagged file application, with multiple occurrences of a field shown as separate fields.

    The fields you requested for your extract will be shown in the order you requested. Each field will begin with a tag that identifies the patent data shown in that field. Multiple occurrences of a field will be shown as separate fields and the tag will be suffixed with a sequence number (i.e., PA1 will be the first Patent Assignee listed, PA2 will be the second Patent Assignee listed, etc.).

    If there are not multiple occurrences in a field that could potentially have multiple occurrences, then the sequence number will still be used but there will only be one occurrence of it (i.e., there will be a PA1 without a PA2).

    A list of all field codes and their tags is shown following.
    .tag

    Sample file

    Graphical example
    EndNote, RefMan, ProCite (RIS) This file was designed primarily for use in ISI ReasearchSoft's EndNote®, Reference Manager® and ProCite®.

    The fields for this extract include a default set of 10 (shown following). No additional/optional fields can be chosen.
  • Publication Number   
  • Title
  • Abstract
  • Inventor
  • Assignee
  • Application Date
  • IPC Codes
  • National Class
  • Domestic Citations
  • Image Information

  • A list of all field codes and their tags is shown following.
    .ris

    Sample file

    Graphical example
    XML: All patents in one file

    OR

    XML: One file per patent
    XML is a flexible, commonly used markup language. XML files downloaded from Data Extract can be used in a variety applications and programs.

    Field codes and tags are not used for XML extracts. The XML file is constructed according to the DTD (Document Type Definition). Use this link to download the DTD.

    When choosing XML, you have the option of extracting full-text fields (which includes bibliographic fields) or just bibliographic fields. If you choose bibliographic fields only, the following fields will not be included in your extract:

  • federalResearchStatement
  • backgroundOfInvention
  • briefDescriptionOfDrawings   
  • generalDescription
  • embodiments
  • claims
  • .xml inside a .zip

    Sample file

    Graphical example

    OR

    .xml per patent inside a .zip

    Sample file

    Graphical example

    [back to top]



    Field code tags used in selected Data Extracts formats:
    Field Code/Tag Field Name
    AC Application Country
    AD Application Date
    AN Application Number
    PAx Assignee
    ASx Assignee City/State
    CAx Assignee Country
    AX Assistant Examiner
    AG Attorney
    CUx US Citations
    OCx Other Citations
    CC Company Code
    ICx IPC Codes
    NCx UPC Codes
    ECx ECLA Codes
    DN Designated States National
    DR Designated States Regional
    FIx Family Information
    FS Field of Search
    FCx Forward References
    FRx Foreign References
    INx Inventor
    ISx Inventor City/State
    CIx Inventor Country
    OUx Original UPC
    PD Patent Date
    PC Patent Country
    PN Patent Number
    PX Primary Examiner
    CP Priority Country
    DP Priority Date
    PR Priority Number
    TI Title

    NOTE: "x" is a sequence number added when you select to download multiple occurrences of the field as separate fields. If you selected the option that downloads multiple occurrences in a field and there are not multiple occurrences in that field for that patent, then the sequence number will still be used but there will only be one occurrence of it (i.e., there will be a PA1 without a PA2). If you select to download the data as one field, the sequence number is not included in the field code.

    [back to top]



    DTD (Document Type Definition) for XML format:
    Use this link to download the DTD that defines the format used for files extract in XML.

    DTD


    [back to top]



    Pricing for Data Extracts containing only patent office fields:
    These prices apply to Data Extracts that contain only patent office fields.

    For Delphion Unlimited subscribers, Data Extract is free for extracts that do not exceed 500 records. For extracts exceeding 500 records, the following charges apply:
    • $10 per additional set of 500 non-XML records that includes abstract, claims or both in the extract (NOTE: abstract is automatically included in extracts for ISI ReasearchSoft's EndNote®, Reference Manager®, and ProCite®).
    • $15 per additional set of 500 records if XML is the download format (either bibliographic fields or full text).
    • $5 per additional set of 500 records for all other fields in standard (non-XML) formats.
    For Delphion Premier subscribers, each set of 500 records is charged as follows:
    • $10 per set of 500 non-XML records that includes abstract, claims or both in the extract (NOTE: abstract is automatically included in extracts for ISI ReasearchSoft's EndNote®, Reference Manager®, and ProCite®).
    • $15 per set of 500 records if XML is the download format (either bibliographic fields or full text).
    • $5 per set of 500 records for all other fields in standard (non-XML) formats.

    [back to top]

    Thomson Reuters Copyright © 1997-2013 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help