 |
 |
|
|
|
|
Title: |
US5937422:
Automatically generating a topic description for text and searching and sorting text by topic using the same
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Nelson, Douglas J.; Columbia, MD
Schone, Patrick John; Elkridge, MD
Bates, Richard Michael; Greenbelt, MD

|
Assignee: |
The United States of America as represented by the National Security Agency, Washington, DC
other patents from UNITED STATES OF AMERICA, DIRECTOR NATIONAL SECURITY AGENCY (597215) (approx. 119)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
1999-08-10
/ 1997-04-15

|
Application Number: |
US1997000834263

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30T4C;

|
U.S. Class: |
Current:
715/206;
707/004;
707/E17.058;
715/205;
715/234;
Original:
707/531;
707/004;
707/532;
707/535;
707/512;

|
Field of Search: |
704/010
707/512,532,535,531,3-5,7

|
Priority Number: |
| 1997-04-15 |
US1997000834263 |

|
Abstract: |
A method of automatically generating a topical description of text by receiving the text containing input words; stemming each input word to its root form; assigning a user-definable part-of-speech score to each input word; assigning a language salience score to each input word; assigning an input-word score to each input word; creating a tree structure under each input word, where each tree structure contains the definition of the corresponding input word; assigning a definition-word score to each definition word; collapsing each tree structure to a corresponding tree-word list; assigning a tree-word-list score to each entry in each tree-word list; combining the tree-word lists into a final word list; assigning each word in the final word list a final-word-list score; and choosing the top N scoring words in the final word list as the topic description of the input text. Document searching and sorting may be accomplished by performing the method described above on each document in a database and then comparing the similarity of the resulting topical descriptions.

|
Attorney, Agent or Firm: |
Morelli, Robert D. ;

|
Primary / Asst. Examiners: |
Amsbury, Wayne; Channavajjala, Srirama

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 31 claims |
What is claimed is:
1. A method of automatically generating a topical description of text, comprising the steps of:
- a) receiving the text, where the text consists of one or more input words;
- b) stemming each input word to its root form;
- c) assigning a user-definable part-of-speech score βi to each input word;
- d) assigning a language salience score Si to each input word;
- e) assigning an input-word score to each input word that is a function of the corresponding input word's part-of-speech score βi, language salience score Si, and the number of times the corresponding input word appears in the text;
- f) creating a tree structure under each input word, where each tree structure contains the definition of the corresponding input word, where each definition word may be further defined to a user-definable number of levels;
- g) assigning a definition-word score Ai,t [j] to each definition word in each tree structure based on the definition word's part-of-speech score βj, the language salience score of the word the definition word defines, a relational salience score Rk,j, and a user-definable factor W;
- h) collapsing each tree structure to a corresponding tree-word list, where each tree-word list contains the unique words contained in the corresponding tree structure;
- i) assigning a tree-word-list score to each word in each tree-word list, where each tree-word-list score is a function of the scores of the corresponding word that existed in the corresponding uncollapsed tree structure;
- j) combining the tree-word lists into a final word list, where the final word list contains the unique words contained in the tree-word lists;
- k) assigning a final-word-list score Afi [j] to each word in the final word list, where Afi [j] is a function of the corresponding word's dictionary salience and tree-word-list scores; and
- l) choosing the top N scoring words in the final word list as the topic description of the input text, where the value N may be defined by the user.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 57 U.S. patent(s) that reference this one

|