Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 19pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6684205: Clustering hypertext with applications to web searching
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
19 pages

 
Inventor: Modha, Dharmendra Shantilal; San Jose, CA
Spangler, William Scott; San Martin, CA

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2004-01-27 / 2000-10-18

Application Number: US2000000690854

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30W1;

U.S. Class: Current: 707/003; 707/010; 707/E17.108; 715/234;
Original: 707/003; 707/010; 715/501.1; 715/513;

Field of Search: 707/003,532,513,501.1,10,1 715/501.1,513

Priority Number:
2000-10-18  US2000000690854

Abstract: A method and structure of searching a database containing hypertext documents comprising searching the database using a query to produce a set of hypertext documents; and geometrically clustering the set of hypertext documents into various clusters using a toric k-means similarity measure such that documents within each cluster are similar to each other, wherein the clustering has a linear-time complexity in producing the set of hypertext documents, wherein the similarity measure comprises a weighted sum of maximized individual components of the set of hypertext documents, and wherein the clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document.

Attorney, Agent or Firm: McGinn & Gibb, PLLC ; McSwain, Esq., Marc D. ;

Primary / Asst. Examiners: Vu, Kim; Liang, Gwen

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Family: Show 3 known family members

First Claim:
Show all 24 claims
What is claimed is:     1. A method of searching a database containing hypertext documents, said method comprising:
  • searching said database using a query to produce a set of hypertext documents;
  • geometrically clustering said set of hypertext documents into various clusters using a similarity measure such that documents within each cluster are similar to each other,
  • wherein said clustering has a linear-time complexity in producing said set of hypertext documents,
  • wherein said similarity measure comprises a weighted sum of maximized individual components of said set of hypertext documents,
  • wherein said clustering is based upon words contained in each hypertext document, out-links from each hypertext document, and in-links to each hypertext document; and
  • eliminating said hypertext documents if said words contained in said hypertext documents appear in fewer than two documents, said out-links contained in said hypertext documents are pointed to by fewer than two documents, and said in-links contained in said hypertext documents are pointed to by fewer than two documents.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 9 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (9)   |   Backward references (18)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 11pp US5787420  1998-07 Tukey et al.  Xerox Corporation Method of ordering document clusters without requiring knowledge of user interests
Buy PDF- 11pp US5787421  1998-07 Nomiyama  International Business Machines Corporation System and method for information retrieval by using keywords associated with a given set of data elements and the frequency of each keyword as determined by the number of data elements attached to each keyword
Buy PDF- 15pp US5819258  1998-10 Vaithyanathan et al.  Digital Equipment Corporation Method and apparatus for automatically generating hierarchical categories from large document collections
Buy PDF- 19pp US5835905  1998-11 Pirolli et al.  Xerox Corporation System for predicting documents relevant to focus documents by spreading activation through network representations of a linked collection of documents
Buy PDF- 18pp US5857179  1999-01 Vaithyanathan et al.  Digital Equipment Corporation Computer method and apparatus for clustering documents and automatic generation of cluster keywords
Buy PDF- 8pp US5864845  1999-01 Voorhees et al.  Siemens Corporate Research, Inc. Facilitating world wide web searches utilizing a multiple search engine query clustering fusion strategy
Buy PDF- 19pp US5895470  1999-04 Pirolli et al.  Xerox Corporation System for categorizing documents in a linked collection of documents
Buy PDF- 14pp US5920859  1999-07 Li  IDD Enterprises, L.P. Hypertext document retrieval system and method
Buy PDF- 28pp US6012058  2000-01 Fayyad et al.  Microsoft Corporation Scalable system for K-means clustering of large databases
Buy PDF- 14pp US6038574  2000-03 Pitkow et al.  Xerox Corporation Method and apparatus for clustering a collection of linked documents using co-citation analysis
Buy PDF- 14pp US6115708  2000-09 Fayyad et al.  Microsoft Corporation Method for refining the initial conditions for clustering with applications to small and large database clustering
Buy PDF- 18pp US6122647  2000-09 Horowitz et al.  Perspecta, Inc. Dynamic generation of contextual links in hypertext documents
Buy PDF- 16pp US6256648  2001-07 Hill et al.  AT&T Corp. System and method for selecting and displaying hyperlinked information resources
Buy PDF- 10pp US6298174  2001-10 Lantrip et al.  Battelle Memorial Institute Three-dimensional display of document set
Buy PDF- 7pp US6363379  2002-03 Jacobson et al.  AT&T Corp. Method of clustering electronic documents in response to a search query
Buy PDF- 32pp US6389436  2002-05 Chakrabarti et al.  International Business Machines Corporation Enhanced hypertext categorization using hyperlinks
Buy PDF- 57pp US6460036  2002-10 Herz  Pinpoint Incorporated System and method for providing customized electronic newspapers and target advertisements
Buy PDF- 90pp US6556983  2003-04 Altschuler et al.  Microsoft Corporation Methods and apparatus for finding semantic information, such as usage logs, similar to a query using a pattern lattice data space
       
Foreign References: None

Other Abstract Info: DERABS C2004-153988

Other References:
  • Weiss et al., HyPursuit: A hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering--In Proceedings of Hypertext 1996, Wahington, DC, USA, pp. 180-193.*
  • "Structuring and Visualizing the WWW by Generalised Similarity Analysis", Chaomei Chen, In proceedings of Hypertext 1997 (Southampton, England, Apr. 1997), pp. 177-186.
  • "Interactive Clustering for Navigating in Hypermedia Systems", Sougata Mukherjea, James D. Foley, Scott E. Hudson, ACM Press, 1994.
  • "From Latent Sematics to Spatial Hypertext An Integrated Approach", Chaomei Chen, Mary Czerwinski, In Proceedings of Hypertext 1998, Pittsburgh, PA, USA, 1998.
  • "HyPursuit: A Hierarchial Network Search Engine that Exploits Content-Link Hypertext Clustering", Ron Weiss, Bienvenido Velez, Mark A. Sheldon, Chanathip Namprempre, Peter Szilagyi, Andrzej Duda, David K. Gifford, In Proceedings of Hypertext 1996, Washington, DC, USA, pp. 180-193.
  • "Information Retrieval Data Structures & Algorithms", William B. Frakes, Ricardo Baeza-Yates, Prentica Hall PTR, Upper Saddle River, New Jersey, 1992.
  • Dhillon, I.S. Modha, D.S., "Concept Decompositions For Large Sparse Text Data Using Clustering", Jul. 8, 1999, pp. 1-32.
  • Silverstein, C., Henzinger, M., Marais, H., Moricz, M., "Analysis of a Very Large Alta Vista Query Log", SRC Technical Note 1998-014, Oct. 26, 1998, pp. 1-17.
  • Chakrabarti, S., Dom, B., Indyk, P., "Enhanced Hypertext Categorization Using Hyperlinks", ACM SIGMOND 1998, Seattle, Washington, pp. 1-12.
  • Kleinberg, Jon M., "Authoritative Sources in a Hyperlinked Environment", Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998, IBM Research Report RJ 10076, May 1997, pp. 1-33.
  • Lawrence, Steve and Giles, C. Lee, "Searching the World Wide Web", Science, vol. 280, Apr. 3, 1998, pp. 98-100. (3 pages) Cited by 18 patents [ISI abstract]
  • Larson Ray R., "Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace", Proceeding of the 1996 American Society for Information Science Annual Meeting, pp. 1-13.
  • Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalen, S., Gibson, D.; Kleinberg, J., "Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text", WWW7, 1998, pp. 1-14.
  • Bradley, P.S. and Fayyad, Usama M., "Refining Initial Points for K-Means Clustering", ICML, 1998, pp. 91-99.
  • Chakrabarti, S. Dom. B.E., Kumar, S.R., Raghayan P., Rajagopalan S., Tomkins, A., Kleinberg, J.M., and Gibson, D., "Hypersearching the Web", Scientific American, Jun. 1999, pp. 1-8.
  • Weiss, R., Velez, B., Sheldon, M.A., Namprempre, C., Szilagyi, P., Duda, A., Gifford, D.K., "Hypursuit: A Hierarchical Network Search Engine That Exploits Content-Link Hypertext Clustering", ACM Hypertext, 1996, pp. 180-193.
  • Mukherjea, S., Foley, J.D., Hudson, S.E., "Interactive Clustering for Navigating in Hypermedia Systems", ACM Hypertext, Sep. 1994, pp. 136-145.
  • Chen, C., "Structuring and Visualising the Web by Generalised Similarity Analysis", ACM Hypertext, 1997.
  • Pirolli, P., Pitkow, J., Rao, R., "Silk From A Sow's Ear: Extracting Usable Structures From the Web", ACM, SIGCHI Human Factors Comput., 1996.
  • Chen, C., Czerwinski, M., "From Latent Semantics to Spatial Hypertex--An Integrated Approach", ACM Hypertext, 1998, pp. 77-86.
  • Botafogo, R.A., "Cluster Analysis for Hypertext Systems", ACM-SIGIR Jun. 1993, pp. 116-125.
  • Rasmussen, E., "Clustering Algorithms", Information Regrieval: Data Structures and Processes (1992), W.B. Frakes and R. Baeza-Yates Eds., Prentice Hall, Englewood Cliffs, NJ, 1992, pp. 419-442, in paper #2.
  • Hartigan, J.A.,, Clustering Algorithms, Wiley, 1975, Chapter 4, pp. 84-107.
  • Willet, P., "Recent Trends in Hierarchic Document Clustering: A Critical Review", Information Processing & Management, 1988, pp. 577-597.


  • Continuity Data:
    Application Number Filed Notes

    US2000000690854 2000-10-18  is a related to the prior publication
         US20040049503A1 issued 2004-03-11  Clustering hypertext with applications to WEB searching

    US2004000950835 2004-09-27  is a continuation of
    >US2000000690854<  2000-10-18   (granted)
         US6684205 issued 2004-01-27   Clustering hypertext with applications to web searching

    US2003000660242 2003-09-11  is a division of
    >US2000000690854<  2000-10-18   (pending) [presumed granted]
         US6684205 issued 2004-01-27   Clustering hypertext with applications to web searching

    US2003000660242   is a division of
    >US2000000690854<  2000-10-18
         US6684205 issued 2004-01-27   Clustering hypertext with applications to web searching


    Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help