Text mining and its application areas in information management

Senior Data Scientist/Researcher, JForce Information Technologies Inc.

Information management (IM) is aimed at serving business practices. It originated in the business world as a method for unifying the vast amounts of information generated from meetings, proposals, presentations, analytics papers, training materials and so on. IM is primarily utilized by large organizations; although, the problem of navigating a multiformat document corpus is relevant to any individual or group that creates and consumes distributed knowledge.

The documents created in an organization represent its potential knowledge. The knowledge is potential because only parts of this data and information can be helpful for creating organizational knowledge. In this view, one major challenge is the selection of relevant information from vast amounts of documents, and the ability to make it available for use and reuse by organization members.

Decision-making support knowledge management’s objective is to ensure that the right information is delivered to the right person at the right time, to make the most appropriate decision. In this sense, IM is not aimed at managing knowledge per se, but to relate knowledge and its usage to mainstream management. Along this line, it focuses on the extraction of relevant information to be delivered to a decision maker. As a result, a range of text mining and natural-language processing techniques can be used as an effective information management system (IMS). This system supports the extraction of relevant information from large amounts of unstructured textual data.

IM professionals are naturally associated with text mining because of their existing skill sets. They are knowledgeable about available products and information-retrieval techniques. Expert IM professionals have analytical and creativity skills, have developed the ability to adapt and try different approaches to problems.

Text-mining roles

Several specific roles for IM professionals exist in text-mining projects: 

  • Facilitating conversations between internal teams and vendors: Scientists and vendors may speak many different languages. Someone needs to negotiate and articulate the various needs and desires and get everyone in agreement. 
  • Placing the text-mining tool in context of other information sources: Understanding what a text mining tool offers versus a familiar search tool can be challenging. Customers need help understanding what they should expect and how it will be different from other search outputs. 
  • Advising vendors and customers on source selection: From the customer viewpoint, understanding why some commercial database information cannot be included in the text-mining effort can be challenging. Licensing and copyright issues are best addressed by an information professional. Vendors likely won’t have the expertise in sources for your specific area of interest. Output based on the sources used for the input can make a big difference. 
  • Counseling on search strategies to retrieve the content set: Even if the vendor is going to use a content source that is familiar to all, such as PubMed, the search strategies used to retrieve the corpus are of critical importance. We have either required documentation of the exact search strategies used by the vendor, or we have provided the search strategies to be used. 
  • Consulting on appropriate taxonomies and ontologies: Again, the vendor may not be familiar with taxonomies specific to your area of interest. The categorization and the organization of the text can be useful—or not—in manipulating results. Be sure the taxonomies will be useful for your data. In some very specialized areas of focus, creating and providing the vendor with some or all of a taxonomy may be necessary. In one case, although we were using the Medical Subject Heading (MeSH) taxonomy, we built out a specific area of interest in much greater detail, as that was the focus of our research. 
  • Helping customers evaluate and manipulate results: Many scientists are already so overloaded with job responsibilities that they don’t have the time or the inclination to invest in learning to use a new tool. Information professionals need to facilitate the usability and gain value from the output. The information professional may have to act as an intermediary, using the tool and producing output to which the scientist can then react. 

Learn more about text mining and other advanced IBM analytics resources.


  • Bordoni, E. D’Avanzo., Prospects for Integrating Text Mining and Knowledge Management. The IPTS Report (Institute for Prospective Technological Studies), Vol. 68. Brussels, October 2002.
  • Hearst M., What is text mining? Essay, October 2003.
  • Day R. E., Clearing up “Implicit Knowledge”: Implications for Knowledge Management, Information Science, Psichology and Social Epistemology, in Wiley Interscience, February 2005.
  • D’Avanzo E., Kuflik T., Elia A., Lieto A., Precious S., Where Does Text Mining Meet Knowledge Management? A Case Study in: Alessandro D’Atri, Marco De Marco, Nunzio Casalino, Interdisciplinary Aspects of Information Systems Studies, p. 311-317, Physica-Verlag Heidelberg 2008.
  • Feldman R. et al., “Knowledge Management: A Text Mining Approach,” Proc. of the 2nd Int. Conference on Practical Aspects of Knowledge Management (PAKM98), Basel, Switzerland, October 1998.
  • Feldman R., Sanger J., The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data, Cambridge University Press, 2007.
  • Lavengood K. A., Kiser P., “Information professionals in the text mine,” Online Magazine, Vol. 31 (3) p. 16, 2007.