Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 11pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6862586: Searching databases that identifying group documents forming high-dimensional torus geometric k-means clustering, ranking, summarizing based on vector triplets
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
11 pages

 
Inventor: Kreulen, Jeffrey Thomas; San Jose, CA, United States of America
Krishna, Vikas; San Jose, CA, United States of America
Modha, Dharmendra Shantilal; San Jose, CA, United States of America
Spangler, William Scott; San Martin, CA, United States of America
Strong, Jr., Hovey Raymond; San Jose, CA, United States of America

Assignee: International Business Machines Corporation, Armonk, NY, United States of America
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2005-03-01 / 2000-02-11

Application Number: US2000000502452

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30G4;

U.S. Class: 707/003; 707/007; 707/100; 707/102;

Field of Search: 707/001-10,100-104.1,200-205,500.1-501.1,512-515,529-532,900-902,907-908 382/224-225,228,230,156-160,305-308 358/403 706/015,47-50 345/440 704/009-10

Priority Number:
2000-02-11  US2000000502452

Abstract:     A method and structure for performing a database search includes searching a database using a query (searching producing result items), and ranking the result items based on one or more of a frequency of an occurrence of in-links and out-links in each of the result items.

Attorney, Agent or Firm: McSwain, Esq., Marc D. ; McGinn & Gibb, PLLC ;

Primary / Asst. Examiners: Channavajjala, Srirama;

Maintenance Status: E1 Expired  Check current status

INPADOC Legal Status: Show legal status actions

Family: None

First Claim:
Show all 8 claims
    1. A method of perforating a database search comprising:

searching a database using a query, said searching identifying a group of hyperlinked documents;

forming a high-dimensional torus geometric representation of said hyperlinked documents, wherein each hyperlinked document is represented by a vector triplet comprising a normalized word frequency, a normalized out-link frequency and a normalized in-link frequency;

clustering said result items into clusters based on said high-dimensional torus geometric representation;

ranking items within each cluster of said clusters based on said high-dimensional torus geometric representation;

summarizing contents of said clusters based on said high-dimensional torus geometric representation, wherein said clustering of the said vector triplets on said high-dimensional torus geometric representation is performed using a toric k-means clustering process that uses a cosine-type similarity measure between document vector triplets, thereby producing clusters of vector triplets and producing a concept triplet for each of the clusters; and

summarizing said clusters of vector triplets based on nuggets of information including:

identifying a closeness of said vector triplets in a cluster to said concept triplet for said cluster on said high-dimensional torus geometric representation;

iidentifying said words with a highest normalized word frequency in said concept triplet for said cluster as the most frequent key-words for each of said clusters;

identifying said out-links with a highest normalized out-link frequency in the concept triplet for the cluster as most frequent key out-links for each of said clusters;

identifying said in-links with a highest normalized in-link frequency in the concept triplet for the cluster as most frequent important in-links for each cluster;

identifying hypertext items relevant to the user's query by using a weighting of terms used in said query;

identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type textual content similarity measure between document vector triplets; and

identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type out-link similarity measure between document vector triplets; and

identifying documents closest to said concept triplet as most typical documents in a cluster, using a cosine-type in-link similarity measure between document vector triplets.



Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 7 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (7)   |   Backward references (15)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 11pp US5787420  1998-07 Tukey et al.  Xerox Corporation Method of ordering document clusters without requiring knowledge of user interests
Buy PDF- 11pp US5787421  1998-07 Nomiyama  International Business Machines Corporation System and method for information retrieval by using keywords associated with a given set of data elements and the frequency of each keyword as determined by the number of data elements attached to each keyword
Buy PDF- 15pp US5819258  1998-10 Vaithyanathan et al.  Digital Equipment Corporation Method and apparatus for automatically generating hierarchical categories from large document collections
Buy PDF- 19pp US5835905  1998-11 Pirolli et al.  Xerox Corporation System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents
Buy PDF- 18pp US5857179  1999-01 Vaithyanathan et al.  Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
Buy PDF- 8pp US5864845  1999-01 Voorhees et al.  Siemens Corporate Research, Inc. Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy
Buy PDF- 19pp US5895470  1999-04 Pirolli et al.  Xerox Corporation System for categorizing documents in a linked collection of documents
Buy PDF- 14pp US5920859  1999-07 Li  IDD Enterprises, L.P. Hypertext document retrieval system and method
Buy PDF- 28pp US6012058  2000-01 Fayyad et al.  Microsoft Corporation Scalable system for K-means clustering of large databases
Buy PDF- 14pp US6038574  2000-03 Pitkow et al.  Xerox Corporation Method and apparatus for clustering a collection of linked documents using co-citation analysis
Buy PDF- 14pp US6115708  2000-09 Fayyad et al.  Microsoft Corporation Method for refining the initial conditions for clustering with applications to small and large database clustering
Buy PDF- 16pp US6256648  2001-07 Hill et al.  AT&T Corp. System and method for selecting and displaying hyperlinked information resources
Buy PDF- 10pp US6298174  2001-10 Lantrip et al.  Battelle Memorial Institute Three-dimensional display of document set
Buy PDF- 7pp US6363379  2002-03 Jacobson et al.  AT&T Corp. Method of clustering electronic documents in response to a search query
Buy PDF- 57pp US6460036  2002-10 Herz  Pinpoint Incorporated System and method for providing customized electronic newspapers and target advertisements
       
Foreign References: None

Other References:
  • Structuring and Visualising the WWW by Generalised Similarity Analysis, Chaomei Chen, In proceedings of Hypertext '97 (Southampton, England, Apr. 1997), pp. 177-186.
  • Interactive Clustering for Navigating in Hypermedia Systems, Sougata Mukherjea, James D. Foley, Scott E. Hudson, ACM Press, 1994.
  • From Latent Semantics to Spatial Hypertext An Integrated Approach, Chaomei Chen, Mary Czerwinski, In Proceedings of Hypertext '98, Pittsburgh, PA, USA, 1998. pp. 77-86.
  • HyPursuit: A Hierarchial Network Search Engine that Exploits Content-Link Hypertext Clustering, Ron Weiss, Bienvenido Velez, Mark A. Sheldon, Chanathip Namprempre, Peter Szilagyi, Andrzej Duda, David K. Gifford, In Proceedings of Hypertext '96, Washington, D.C., USA, pp. 180-193.
  • Information Retrieval Data Structures & Algorithms, William B. Frakes, Ricardo Baeza-Yates, Prentice Hall PTR, Upper Saddle River, New Jersey, 1992., Chapter 16, pp. 419-442.


  • Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help