Latest Article
A Simple Implementation of Distributed Vertical Search and Information Integration Technology
Time:2014-1-07  
LIU Jinshuo1, YANG Nanhai2, LIU Yuan2, DENG Juan2†
1. School of Computer, Wuhan University, Wuhan 430072, Hubei, China; 2. International School of Software, Wuhan University, Wuhan 430072, Hubei, China
Abstract:
The paper proposes the research on the distributed vertical search and information integration technology based on Web mining, which aims at satisfying the requirements of the spe- cific fields’ applications. Nowadays, mining, analyzing, and inte- grating Web’s content have become an important trend for daily use. The technique includes the Map/Reduce model, the depth search, and the basic principles of information integration. The focus of the paper is how to implement the distributed vertical search engine based on Map/Reduce technology and the informa- tion integration system. System optimization mechanism and the system test are also proposed. 
Key words:distributed system; search engine; Hadoop; Map/Reduce
CLC number:TP 391
References:
[1] Sergey B. The Anatomy of a large-scale hypertextual Web search engine [EB/OL]. [2012-02-21]. http://infolab.stan- ford.edu/~backrub/google.html.  
[2] Wang Wenjun, Li Wei. Probe into present situation and de- velopment of vertical search engine [J]. Information Science, 2010, (3): 477-480(Ch). 
[3] Holly G . With specialty search engines [J]. Teacher Librar- ian, 2004, 32(2): 50-55. 
[4] Dean J, Ghemawat S. MapReduce: simplified data process- ing on large clusters [EB/OL]. [2012-02-04]. http://dl.acm. org/ci- tation.cfm?id=1327492. 
[5] Jiang J. Research of main distributed search engine technol- ogy [J]. Science Technology and Engineering, 2007, 7(10): 2418-2424. 
[6] Hu Y, Feng J. Distributed search engine using hadoop [J]. Computer Systems & Applications, 2010, 19(7): 224-228. 
[7] Bergman M K. The deep Web: Surfacing hidden value. white paper on the deep Web. 2007 [EB/OL]. [2012-01-01]. http://www.brightplanet.Com/pdf/DeepWeb whitepaper.pdf. 
[8] Graupmann J, Biwer M, Zimmer C, et al. COMPASS: A concept-based Web search engine for HTML, XML, and deep web data [EB/OL]. [2012-03-01]. http://www.vldb. org/conf/2004/DEMP16.PDF. 
[9] Wu Xindong. A frame based architecture for information integration in CIMS [J]. Journal of Computer Science and Technology, 2010,(2): 89-94(Ch). 
[10] World Wide Web Consortium (W3C). The document object model, 1998 [EB/OL]. [2012-03-05]. http://www.w3c.org/ dom. 
[11] Wikipedia. Precision and recall [EB/OL]. [2012-04-09]. http://en.wikipedia.org/wiki/Precision_and_recall.