Work Files Saved Searches
   My Account                                                  Search:   Quick/Number   Boolean   Advanced   Derwent    Help   


 The Delphion Integrated View

  Buy Now:   Buy PDF- 24pp  PDF  |   File History  |   Other choices   
  Tools:  Citation Link  |  Add to Work File:    
  View:  Expand Details   |  INPADOC   |  Jump to: 
  Go to:  Derwent  
 Email this to a friend  Email this to a friend 
       
Title: US6233575: Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values
[ Derwent Title ]


Country: US United States of America

View Images High
Resolution

 Low
 Resolution

 
24 pages

 
Inventor: Agrawal, Rakesh; San Jose, CA
Chakrabarti, Soumen; San Jose, CA
Dom, Byron Edward; Los Gatos, CA
Raghavan, Prabhakar; Saratoga, CA

Assignee: International Business Machines Corporation, Armonk, NY
other patents from INTERNATIONAL BUSINESS MACHINES CORPORATION (280070) (approx. 44,393)
 News, Profiles, Stocks and More about this company

Published / Filed: 2001-05-15 / 1998-06-23

Application Number: US1998000102861

IPC Code: Advanced: G06F 17/30;
Core: more...
IPC-7: G06F 17/30;

ECLA Code: G06F17/30T4M;

U.S. Class: Current: 707/006; 706/012; 707/002; 707/E17.091;
Original: 707/006; 707/002; 706/012;

Field of Search: 707/001-10,100-104,200-206,500-503,511-516,531-536,907 706/012-21,25-28,45-55,60-61,934 382/156-157

Priority Number:
1998-06-23  US1998000102861
1997-06-24  US1997000050611P

Abstract:     A system, process, and article of manufacture for organizing a large text database into a hierarchy of topics and for maintaining this organization as documents are added and deleted and as the topic hierarchy changes. Given sample documents belonging to various nodes in the topic hierarchy, the tokens (terms, phrases, dates, or other usable feature in the document) that are most useful at each internal decision node for the purpose of routing new documents to the children of that node are automatically detected. Using feature terms, statistical models are constructed for each topic node. The models are used in an estimation technique to assign topic paths to new unlabeled documents. The hierarchical technique, in which feature terms can be very different at different nodes, leads to an efficient context-sensitive classification technique. The hierarchical technique can handle millions of documents and tens of thousands of topics. A resulting taxonomy and path enhanced retrieval system (TAPER) is used to generate context-dependent document indexing terms. The topic paths are used, in addition to keywords, for better focused searching and browsing of the text database.

Attorney, Agent or Firm: Gates & Cooper LLP ;

Primary / Asst. Examiners: Breene, John; Channavajjala, Srirama

INPADOC Legal Status: Show legal status actions          Buy Now: Family Legal Status Report

Parent Case:

PROVISIONAL APPLICATION
    The present application claims the benefit of U.S. Provisional Application Ser. No. 60/050,611, entitled "USING TAXONOMY, DISCRIMINANTS, AND SIGNATURES FOR NAVIGATING IN TEXT DATABASES", filed Jun. 24, 1997, by Rakesh Agrawal, et al., which is incorporated herein by reference, in its entirety.

Family: Show 3 known family members

First Claim:
Show all 32 claims
What is claimed is:     1. A process for classifying new documents containing features under nodes defining a multilevel taxonomy, based on features derived from a training set of documents that have been classified under respective nodes of the taxonomy, the process comprising:
  • associating a respective set of features with each one of said plurality of nodes, each given set of features comprising a plurality of features that are in at least one training document classified under the associated node; and
  • classifying each new document under at least one node, based on the set of features associated with said at least one node, further comprising:
  • determining a discrimination value for each term in at least one training document which is classified under each one of a plurality of the nodes of the taxonomy, wherein the discrimination value comprises a Fisher value based on the equation: [Figure]
  • where t represents a term, d represents a document, c represents a class, [Figure]
    • determining a minimum discrimination value for each of said plurality of nodes;
    • wherein the features in each given set of features have discrimination values equal to or above the minimum discrimination value determined for the node associated with the given set of features.


Background / Summary: Show background / summary

Drawing Descriptions: Show drawing descriptions

Description: Show description

Forward References: Show 169 U.S. patent(s) that reference this one

       
U.S. References: Go to Result Set: All U.S. references   |  Forward references (169)   |   Backward references (20)   |   Citation Link

Buy
PDF
Patent  Pub.Date  Inventor Assignee   Title
Buy PDF- 117pp US4975975  1990-12 Filipski  GTX Corporation Hierarchical parametric apparatus and method for recognizing drawn characters
Buy PDF- 25pp US5168565  1992-12 Morita  Ricoh Company, Ltd. Document retrieval system
Buy PDF- 20pp US5317507  1994-05 Gallant   Method for document retrieval and for word sense disambiguation using neural networks
Buy PDF- 23pp US5325298  1994-06 Gallant  HNC, Inc. Methods for generating or revising context vectors for a plurality of word stems
Buy PDF- 29pp US5418946  1995-05 Mori  Fuji Xerox Co., Ltd. Structured data classification device
Buy PDF- 21pp US5428778  1995-06 Brookes  Office Express Pty. ltd. Selective dissemination of information
Buy PDF- 100pp US5469354  1995-11 Hatakeyama et al.  Hitachi, Ltd. Document data processing method and apparatus for document retrieval
Buy PDF- 17pp US5506984  1996-04 Miller  Digital Equipment Corporation Method and system for data retrieval in a distributed system using linked location references on a plurality of nodes
Buy PDF- 115pp US5519857  1996-05 Kato et al.  Hitachi, Ltd. Hierarchical presearch type text search method and apparatus and magnetic disk unit used in the apparatus
Buy PDF- 20pp US5535382  1996-07 Ogawa  Ricoh Company, Ltd. Document retrieval system involving ranking of documents in accordance with a degree to which the documents fulfill a retrieval condition corresponding to a user entry
Buy PDF- 45pp US5557794  1996-09 Matsunaga et al.  Fuji Xerox Co., Ltd. Data management system for a personal data base
Buy PDF- 26pp US5568640  1996-10 Nishiyama et al.  Hitachi, Ltd. Document retrieving method in a document managing system
Buy PDF- 67pp US5576954  1996-11 Driscoll  University of Central Florida Process for determination of text relevancy
Buy PDF- 62pp US5600827  1997-02 Nakabayashi et al.  Seiko Epson Corporation Data management, display, and retrival system for a hierarchical collection
Buy PDF- 17pp US5625767  1997-04 Bartell et al.   Method and system for two-dimensional visualization of an information taxonomy and of text documents based on topical content of the documents
Buy PDF- 23pp US5659724  1997-08 Borgida et al.  NCR Interactive data analysis apparatus employing a knowledge base
Buy PDF- 14pp US5675710  1997-10 Lewis  Lucent Technologies, Inc. Method and apparatus for training a text classifier
Buy PDF- 14pp US5826260  1998-10 Byrd, Jr. et al.  International Business Machines Corporation Information retrieval system and method for displaying and ordering information based on query element contribution
Buy PDF- 17pp US5838816  1998-11 Holmberg  Hughes Electronics Pattern recognition system providing automated techniques for training classifiers for non stationary elements
Buy PDF- 27pp US5918240  1999-06 Kupiec et al.  Xerox Corporation Automatic method of extracting summarization using feature probabilities
       
Foreign References:
Buy
PDF
Publication Date IPC Code Assignee   Title
Buy PDF- 65pp EP0744702A1 1996-11  G06F 17/30 MATSUSHITA ELECTRIC IND CO LTD Information searching apparatus for searching text to retrieve character streams agreeing with a key word 


Other References:
  • Ho, T.K. et al., decision combination in multiple classifier systems, IEEE transactions on pattern analysis and machine intelligence, vol. 16, No. 1, pp 66-75, Jan. 1994.*
  • Soumen Chakrabarti et al., Enhanced hypertext categorization using hyperlinks, proceedings of ACM SIGMOD international conference on Management of data, and 307-318, Jun. 1998.*
  • Yuwono, B et al., search and ranking algorithms for locating resources on world wide web, proceedings of the 12th international conference, pp 164-171, Mar. 1996.*
  • Hill, P. et al., "Multiple Views of Product Information", IBM Technical Disclosure Bulletin, vol. 39, No. 02, pp. 17-24 (Feb. 1996).
  • Rus, D. et al., "Using Non-Textual Cues for Electronic Document Browsing", Digital Libraries Workshop DL '94, Newark, NJ, USA, May 19-20, 1994 Selected Papers, Chapter 9, pp. 129-162.
  • Koller, D. et al., "Hierarchically Classifying Documents Using Very Few Words", The Fourteenth International Conference on Machine Learning, pp. 170-178 (Jul. 1997).
  • Mladenic D., "Feature Subset Selection in Text-Learning", 10th European Conference on Machine Learning, pp. 95-100, (1998).
  • Yang, Y. et al., "A Comparative Study on Feature Selection in Text Categorization", International Conference on Machine Learning, pp. 412-420 (Jul. 1997).
  • Apte, C. et al., "Automated Learning of Decision Rules for Text Categorization", IBM Research Report RC 18879. To Appear in ACM Transactions on Information Systems, pp. 1-20 (no date).; vol. 12, Issue 3, accepted Mar. 1994.
  • Schutze, H. et al., "A Comparison of Classifiers and Document Representations for the Routing Problem", Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 229-237 (Jul. 1995).
  • Lewis, D., "Evaluating Text Categorization", Proceedings of the Speech and Natural Language Workshop, Asilomar, pp. 312-318 (Feb. 1991).
  • Lewis, D., "Feature Selection and Feature Extraction for Text Categorization", Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York pp. 212-217 (Feb. 1992).
  • Koller, D., "Toward Optimal Feature Selection", In Lorenza Saitta, ed., Machine Learning: Proc. Of the Thirteenth International Conference, Morgan Kaufmann, 9 pages, (1996).
  • Panyr, J., "STEINADLER--a system of automatic description and classification of documents", Nachr. Dok, vol. 29, No. 4-5, pp. 184-191 (Sep. 1978) (Abstract in English). Abstract in English Only Considered.


  • Continuity Data:
    Application Number Filed Notes

    US2001000777278 2001-02-05  is a division of
    >US1998000102861<  1998-06-23   (granted)
         US6233575 issued 2001-05-15   Multilevel taxonomy based on features derived from training documents classification using fisher values as discrimination values

    US1998000102861 1998-06-23  is a non-provisional of provisional
    US1997000050611P  1997-06-24


    Inquire Regarding Licensing

    Powered by Verity


    Plaques from Patent Awards      Gallery of Obscure PatentsNominate this for the Gallery...

    Thomson Reuters Copyright © 1997-2010 Thomson Reuters 
    Subscriptions  |  Web Seminars  |  Privacy  |  Terms & Conditions  |  Site Map  |  Contact Us  |  Help