Latest Article
A Simple Implementation of Distributed Vertical Search and Information Integration Technology
LIU Jinshuo1, YANG Nanhai2, LIU Yuan2, DENG Juan2†
1. School of Computer, Wuhan University, Wuhan 430072, Hubei, China; 2. International School of Software, Wuhan University, Wuhan 430072, Hubei, China
The paper proposes the research on the distributed vertical search and information integration technology based on Web mining, which aims at satisfying the requirements of the spe- cific fields’ applications. Nowadays, mining, analyzing, and inte- grating Web’s content have become an important trend for daily use. The technique includes the Map/Reduce model, the depth search, and the basic principles of information integration. The focus of the paper is how to implement the distributed vertical search engine based on Map/Reduce technology and the informa- tion integration system. System optimization mechanism and the system test are also proposed. 
Key words:distributed system; search engine; Hadoop; Map/Reduce
CLC number:TP 391
[1] Sergey B. The Anatomy of a large-scale hypertextual Web search engine [EB/OL]. [2012-02-21]. http://infolab.stan-  
[2] Wang Wenjun, Li Wei. Probe into present situation and de- velopment of vertical search engine [J]. Information Science, 2010, (3): 477-480(Ch). 
[3] Holly G . With specialty search engines [J]. Teacher Librar- ian, 2004, 32(2): 50-55. 
[4] Dean J, Ghemawat S. MapReduce: simplified data process- ing on large clusters [EB/OL]. [2012-02-04]. http://dl.acm. org/ci- tation.cfm?id=1327492. 
[5] Jiang J. Research of main distributed search engine technol- ogy [J]. Science Technology and Engineering, 2007, 7(10): 2418-2424. 
[6] Hu Y, Feng J. Distributed search engine using hadoop [J]. Computer Systems & Applications, 2010, 19(7): 224-228. 
[7] Bergman M K. The deep Web: Surfacing hidden value. white paper on the deep Web. 2007 [EB/OL]. [2012-01-01]. http://www.brightplanet.Com/pdf/DeepWeb whitepaper.pdf. 
[8] Graupmann J, Biwer M, Zimmer C, et al. COMPASS: A concept-based Web search engine for HTML, XML, and deep web data [EB/OL]. [2012-03-01]. http://www.vldb. org/conf/2004/DEMP16.PDF. 
[9] Wu Xindong. A frame based architecture for information integration in CIMS [J]. Journal of Computer Science and Technology, 2010,(2): 89-94(Ch). 
[10] World Wide Web Consortium (W3C). The document object model, 1998 [EB/OL]. [2012-03-05]. dom. 
[11] Wikipedia. Precision and recall [EB/OL]. [2012-04-09].