 |
 |
|
|
|
|
Title: |
US6862586:
Searching databases that identifying group documents forming high-dimensional torus geometric k-means clustering, ranking, summarizing based on vector triplets
[ Derwent Title ]

|
Country: |
US United States of America

|
| |
Inventor: |
Kreulen, Jeffrey Thomas; San Jose, CA, United States of America
Krishna, Vikas; San Jose, CA, United States of America
Modha, Dharmendra Shantilal; San Jose, CA, United States of America
Spangler, William Scott; San Martin, CA, United States of America
Strong, Jr., Hovey Raymond; San Jose, CA, United States of America

|
Assignee: |
International Business Machines Corporation, Armonk, NY, United States of America
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
News, Profiles, Stocks and More about this company

|
Published / Filed: |
2005-03-01
/ 2000-02-11

|
Application Number: |
US2000000502452

|
IPC Code: |
Advanced:
G06F 17/30;
Core:
more...
IPC-7:
G06F 17/30;

|
ECLA Code: |
G06F17/30G4;

|
U.S. Class: |
707/003;
707/007;
707/100;
707/102;

|
Field of Search: |
707/001-10,100-104.1,200-205,500.1-501.1,512-515,529-532,900-902,907-908
382/224-225,228,230,156-160,305-308
358/403
706/015,47-50
345/440
704/009-10

|
Priority Number: |
| 2000-02-11 |
US2000000502452 |

|
Abstract: |
A method and structure for performing a database search includes searching a database using a query (searching producing result items), and ranking the result items based on one or more of a frequency of an occurrence of in-links and out-links in each of the result items.

|
Attorney, Agent or Firm: |
McSwain, Esq., Marc D. ;
McGinn & Gibb, PLLC ;

|
Primary / Asst. Examiners: |
Channavajjala, Srirama;

|
Maintenance Status: |
E1 Expired Check current status

|
INPADOC Legal Status: |
Show legal status actions

|
Family: |
None

|
First Claim:
Show all 8 claims |
1. A method of perforating a database search comprising: searching a database using a query, said searching identifying a group of hyperlinked documents; forming a high-dimensional torus geometric representation of said hyperlinked documents, wherein each hyperlinked document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency; clustering said result items into clusters based on said high-dimensional torus geometric representation; ranking items within each cluster of said clusters based on said high-dimensional torus geometric representation; summarizing contents of said clusters based on said high-dimensional torus geometric representation, wherein said clustering of the said vector triplets on said high-dimensional torus geometric representation is performed using a toric k-means clustering process that uses a cosine-type similarity measure between document vector triplets, thereby producing clusters of vector triplets and producing a concept triplet for each of the clusters; and summarizing said clusters of vector triplets based on nuggets of information including: identifying a closeness of said vector triplets in a cluster to said concept triplet for said cluster on said high-dimensional torus geometric representation; iidentifying said words with a highest normalized word frequency in said concept triplet for said cluster as the most frequent key-words for each of said clusters; identifying said out-links with a highest normalized out-link frequency in the concept triplet for the cluster as most frequent key out-links for each of said clusters; identifying said in-links with a highest normalized in-link frequency in the concept triplet for the cluster as most frequent important in-links for each cluster; identifying hypertext items relevant to the user's query by using a weighting of terms used in said query; identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type textual content similarity measure between document vector triplets; and identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type out-link similarity measure between document vector triplets; and identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type in-link similarity measure between document vector triplets.

|
Background / Summary: |
Show background / summary

|
Drawing Descriptions: |
Show drawing descriptions

|
Description: |
Show description

|
Forward References: |
Show 7 U.S. patent(s) that reference this one

|
 |
 |
|
|
|
|
Foreign References: |
None

|
Other References: |
Structuring and Visualising the WWW by Generalised Similarity Analysis, Chaomei Chen, In proceedings of Hypertext '97 (Southampton, England, Apr. 1997), pp. 177-186.
Interactive Clustering for Navigating in Hypermedia Systems, Sougata Mukherjea, James D. Foley, Scott E. Hudson, ACM Press, 1994.
From Latent Semantics to Spatial Hypertext An Integrated Approach, Chaomei Chen, Mary Czerwinski, In Proceedings of Hypertext '98, Pittsburgh, PA, USA, 1998. pp. 77-86.
HyPursuit: A Hierarchial Network Search Engine that Exploits Content-Link Hypertext Clustering, Ron Weiss, Bienvenido Velez, Mark A. Sheldon, Chanathip Namprempre, Peter Szilagyi, Andrzej Duda, David K. Gifford, In Proceedings of Hypertext '96, Washington, D.C., USA, pp. 180-193.
Information Retrieval Data Structures & Algorithms, William B. Frakes, Ricardo Baeza-Yates, Prentice Hall PTR, Upper Saddle River, New Jersey, 1992., Chapter 16, pp. 419-442.

|


|
Nominate this for the Gallery...

|
|