- Title
- Mining tables and lists on the Web for desired relation
- Author(s)
- Yangyang Wu and Haruo Yokota
- Contact person
- Haruo Yoktoa (yokota@cs.titech.ac.jp)
- Abstract
- According to human writing manner, there are all kinds of tables
and lists on the Web. These tables and lists carry a lot of useful
information. Using search engines, it is not easy to fine them. In
this paper, we propose a novel method to recognize and extract
relations from the Web. It is based on semantic and formal
characters. We define models to represent a desired relation and a
"repeated structure" like table or list on Web pages, and introduce
a set of functions to measure repeated structures to see if they
contain a desired relation. We develop algorithms for training
machine and mining the Web for desired relations. Finally we give
our experiment results and discuss the further works.
- Title
- A special data structure for web page analysis
- Author(s)
- Yangyang Wu and Haruo Yokota
- Contact person
- Haruo Yokota (yokota@cs.titech.ac.jp)
- Abstract
- Mining the Web for desired information is one of hot topics in recent
years. According to human writing manner, there are all kinds of
tables and lists on the Web. These tables and lists contain a lot of
useful information. Analyzing and recognizing them is one of important
works for Web content mining. In this paper, we present a special data
structure, called WPS-tree, for web page analysis. The WPS-tree is
based on visible object. It will catch the logical structure of pages
more exactly. We give its definition and algorithm of constructing the
tree, and discuss how to use it to recognize the nested relationship
of data and the relationship between HTML tags and texts on Web
pages. In particular we descript how to use it to recognize the
repeated structures like table and list on the pages in our relation
recognition system and discuss the results of our experiments.