Titech Technical Report (Dept. of CS) 2000 Abstract

TR00-0001 (April)

Title: Verification of wheel structured parameterized circuits by using finite automata
Author(s): Hiroshi Toshima, Tomoya Noro and Tomohiro Yoneda
Contact person: Tomohiro Yoneda (yoneda@yt.cs.titech.ac.jp)
Abstract: It is known that many systems have regular structure constructed from several kinds of basic modules. Most of previously proposed methods for verifying these parameterized systems could handle rather simple structures like linear arrays, rings or stars. On the other hand, verification of parameterized asynchronous circuits often needs to handle more complicated structure. In this paper, we forcus on a wheel structured system which consists of one kernel module and many idential symmetry modules, and aim at verifying such systems of arbitrary sizes. For this purpose, we propose a fully automatic procedure which explores the state space of wheel structured systems based on the state representation using finite automata. We also demonstrate an implementation with some exprimental results for parameterized wheel structured asynchronous circuits.

TR00-0002 (April)

Title: A practical modification of Indigo algorithm for handling cyclic constraint relationships
Author(s): Natsuki Saito, Tetsuya Suzuki and Takehiro Tokuda
Contact person: Natsuki Saito (sato@mi.cs.titech.ac.jp)
Abstract: Indigo is a powerful local propagation algorithm for solving constraint hierarchies including inequality constraints, but it cannot solve problems with cyclic constraint relationships, which appear in many constraint-oriented problems. We present a practical modification method of Indigo algorithm for handling cyclic constraint relationships.

TR00-0003 (April)

Title: String indexing and its application to natural language processing
Author(s): Hideo Itoh
Contact person: Hideo Itoh (hideo@@src.ricoh.co.jp)
Abstract: Large-scale electronic text collections are more available now than ever before. String indexing is one of fundamental techniques to treat these collections effectively and efficiently. Suffix array is most promissing due to compactness of the data structure and efficiency which is independent of the alphabet size. We proposed three algorithms for constructing suffix arrays. The first algorithm called "two-stage suffix sort" enable us to construct suffix arrays about two or three times faster than previous works. The next algorithm called "rank sort" can resolve the problematic trade-off between compactness and robustness for lengthy repetitions. The last algorithm "divisive sort" is very compact. For byte-string of size N, previous algorithms require memory of 5N or 8N bytes to construct suffix arrays. On the other hand, the requirement of divisive sort is about N bytes. We also proposed a method for construction of large-scale statistical language model using suffix arrays.

TR00-0004 (April)

Title: Formal languages for fuzzy subjects - Modality, metaphor and machine translation
Author(s): Christoph Neumann
Contact person: Christoph Neumann (neumann@cl.cs.titech.ac.jp)
Abstract: In this two-fold study, we present new form-based frameworks for the computational treatment of two linguistic phenomena, modality and metaphor, that had been neglected in natural language processing hitherto. While modality is a major research topic in linguistics and a central constituent of language production, current machine translation systems have no autonomous treatment of modality, and are thus erroneous in their identification of modality. Modality can only be handled through a modal interlingua. We defined a new Module of Modality (MoM) as an interlingua for modality, with 17 modal categories based on not on semantical definition, but extracted from formal equivalence relations between aligned sentences in English, Japanese, German and French. By implementing MoM, we were able to identify 45.0 % of the modal content of a Japanese test set, gaining 44.6% precision over a conventional system. The coherent layout and the significant improvement in modality recognition make MoM not only an implementational module, but also an alternative approach to modality theory. In metaphor research, we gathered a corpus 106 metaphors similar in both Japanese and German and established a taxonomy for defining and classifying cross-lingual similarity of metaphors. This corpus is a strong support for the cognitive current in metaphor research, claiming that metaphor is a universal cognitive device, but which has basically been based on findings from a single language, English.

TR00-0005 (May)

Title: A case-based approach to the generation of musical expression
Author(s): Taizan Suzuki
Contact person: Taizan Suzuki (taizan@mori.cs.titech.ac.jp)
Abstract: The majority of naturally sounding musical performance has musical expression (fluctuation in tempo, volume, etc.). Musical expression is a highly significant element in making performance pleasant and attractive. It varies according to the performer's musical sense, and is affected by various factors called performance conditions, such as the performer, performative style, mood, and so forth. As such , the computerized generation of musical expression is not an easy task.
Many past research efforts have focused on the computerized generation of musical expression. However, in past research, performance conditions has been treated as being less significant.
This paper proposes a case-based approach to the generation of expressively modulated musical performance. This method uses a set of musical performance data played by a human musician, and requires the score of the target piece and performance condition settings as input. The persented method generates performance data for the target piece through imitating human performace data for pieces similar to the input piece. In the generation process, inputted performance condition settings are used to validate each instance of human performance data. This mechanism enables the generation of various kinds of musical expression for a single piece of music in accordance with performance condition settings.
This paper also proposes four component technologies for the presented case-based method: utilization of the performance data set, representation and calculation of musical expression, evaluation of phrase similarity, and representation of performance conditions. The utilization technique is effective in overcoming data sparseness problems. The representation of musical expression is indispensable in implementing the utilization technique. The evaluation method of phrase similarity induces the phrase similarity score from resemblance of phrase characteristics. The representation of performace conditions is quite important to introduce performance conditions into the generation process.
The presented method was implemented in the ``Kagurame'' musical performance generation system. Listening experiments with Kagurame's output show that this method is useful for the generation of expressive performance. It was also confirmed that this method can generate varying musical expression for a single piece of music through changing the performance condition settings.
As an application of the presented case-based method, this paper describes the possibility of introduction of the method into an accompaniment system. The method is considered to be useful in developping rehearsal and learning performative idiosyncracies of the performer, major issues for existing accompaniment systems.

TR00-0006 (May)

Title: Active learning for optimal generalization in trigonometric polynomial models
Author(s): Masashi Sugiyama and Hidemitsu Ogawa
Contact person: Masashi Sugiyama (sugi@og.cs.titech.ac.jp)
Abstract: In this paper, we consider the problem of active learning in trigonometric polynomial models and give a necessary and sufficient condition of sample points to provide the optimal generalization capability. By analyzing the condition from the functional analytic point of view, we clarify the mechanism of achieving optimal generalization. We also show that training examples satisfying the condition do not only provide the optimal generalization capability but also reduce the computational complexity and memory required for the calculation of learning results. Finally, design methods of sample points satisfying the optimality condition are given and computer simulations are performed to demonstrate the effectiveness of the proposed active learning method.

TR00-0007 (May)

Title: Model selection with small samples
Author(s): Masashi Sugiyama and Hidemitsu Ogawa
Contact person: Masashi Sugiyama (sugi@og.cs.titech.ac.jp)
Abstract: The problem of model selection has been extensively studied from various standpoints. However, most of the criteria proposed so far work well only when a large number of training examples is available. In this paper, we propose a new model selection criterion named the subspace information criterion (SIC). SIC gives an unbiased estimate of the generalization error. Our simulation demonstrates that SIC works well even when the number of training examples is small and the noise variance is large. Comparison with other model selection techniques including Akaike's information criterion (AIC), corrected AIC (cAIC), the Bayesian information criterion (BIC), the minimum description length (MDL) criterion, and Vapnik's measure (VM) is also presented.

TR00-0008 (May)

Title: Release from the active learning / model selection dilemma --- optimizing sample points and models at the same time
Author(s): Masashi Sugiyama and Hidemitsu Ogawa
Contact person: Masashi Sugiyama (sugi@og.cs.titech.ac.jp)
Abstract: In supervised learning, the selection of sample points and models is crucial for acquiring a higher level of generalization capability. So far, the problems of active learning and model selection have been independently studied. If sample points and models are simultaneously optimized, then a higher level of generalization capability is expected. We call this problem active learning with model selection. However, this problem can not be generally solved by simply combining existing active learning and model selection techniques because of the active learning / model selection dilemma: the model should be fixed for selecting sample points and conversely the sample points should be fixed for selecting models. In spite of the dilemma, the problem of active learning with model selection can be straightforwardly solved if there is a set of sample points being optimal for all models in consideration. In this paper, we show that such an optimal set of sample points actually exists, and give a procedure for active learning with model selection.

TR00-0009 (July)

Title: ``Kairai'' - software robots understanding natural language
Author(s): Yusuke Shinyama, Takenobu Tokunaga and Hozumi Tanaka
Contact person: Yusuke Shinyama (euske@cl.cs.titech.ac.jp)
Abstract: In this paper we give an overview of the ``Kairai'' virtual actor system, which understands natural language instructions and displays the results via software robots. In controlling software robots acting in a 3-D world, the system needs to access various types of information from language expressions. Here, we concentrate especially a handling anaphora, used to indicate objects or positions in the virtual world. Our system contains a robot belief database and analyzes the user's speech act in manipulating the database. We also consider ellipsis, which is used frequently in command-style dialogues.

TR00-0010 (July)

Title: Processing of 3-D spatial relations for virtual agents acting on natural language instructions
Author(s): Yusuke Shinyama, Takenobu Tokunaga and Hozumi Tanaka
Contact person: Yusuke Shinyama (euske@cl.cs.titech.ac.jp)
Abstract: We are developing a system in which a user can control virtual agents through natural language dialogue. We particularly focus on the semantic analysis of spatial expressions in natural language instructions. Spatial expressions are relative to the speaker's viewpoint and the indicated positions can change dynamically. In this paper we propose a semantic representation using lambda structures. Some features of lambda structure enable us to keep the speaker's viewpoints in such semantic representations undetermined, to keep track of changing positions, and to encapsulate representations like English prepositions. With this representation, we can handle this kind of spatial expressions efficiently. We also state the method of translating natural language instructions to this representation.

TR00-0011 (August)

Title: An incremental optimistic concurrency control for parallel B-tree structures
Author(s): Jun Miyazaki and Haruo Yokota
Contact person: Haruo Yokota (yokota@cs.titech.ac.jp)
Abstract: We have proposed a new parallel B-tree structure, Fat-Btree, to improve the performance of accessing for shared-nothing parallel database systems. The Fat-Btree has only a part of index nodes on each PE, and can reduce synchronization costs in update operations. For these reasons, both retrieval and update operations can be processed at high throughput compared to previously proposed shared-nothing parallel B-tree structures. During the implementation of the Fat-Btree on a shared-nothing parallel computer, nCUBE3, which has 64 disks, we found the importance of the concurrency control for the Fat-Btree. Good concurrency control schemes for shared-everything machines are not always suitable for shared-nothing ones. In this paper, we propose a new deadlock free concurrency control protocol, named INC-OPT, and show that the INC-OPT is much better for the Fat-Btree than B-OPT method and one used in ARIES/IM, which were proposed previously. In addition, we also compare the real performance of three types of parallel B-tree structures, Fat-Btree, Copy-Whole-Btree, and Single-Index-Btree on the nCUBE3 machine when the INC-OPT is applied so that we can prove that the Fat-Btree provides the impact on the performance of shared-nothing parallel databases.

TR00-0012 (August)

Title: Another look at C_P and network information criterion as approximation of subspace information criterion
Author(s): Masashi Sugiyama and Hidemitsu Ogawa
Contact person: Masashi Sugiyama (sugi@og.cs.titech.ac.jp)
Abstract: Recently, a new model selection criterion called the subspace information criterion (SIC) was proposed. SIC works well with small samples since it gives an unbiased estimate of the generalization error with finite samples. In this paper, we derive an approximation of SIC named the approximated SIC (ASIC), whose calculation is easier than the original SIC. ASIC essentially agrees with C_P and the network information criterion (NIC), where NIC is a generalization of Akaike's information criterion. This agreement shows that C_P and NIC can be used as an approximation of SIC. It also elucidates the reasons why C_P sometimes works well despite the fact that it does not directly take the generalization error into account and why NIC sometimes works well with small samples despite the fact that it is derived by making use of asymptotic approximation.

TR00-0013 (August)

Title: Optimal design of regularization term and regularization parameter by subspace information criterion
Author(s): Masashi Sugiyama and Hidemitsu Ogawa
Contact person: Masashi Sugiyama (sugi@og.cs.titech.ac.jp)
Abstract: The problem of designing the regularization term and regularization parameter for optimal generalization is discussed. We derive an approximation of the generalization error called the subspace information criterion (SIC) for regularization learning. SIC gives an unbiased estimate of the generalization error with finite samples. SIC is used for a) selecting the optimal regularization term and regularization parameter from given candidates, and b) obtaining the closed form of the optimal regularization parameter. The effectiveness of the proposed method is demonstrated through computer simulations.

TR00-0014 (November)

Title: Subspace information criterion for image restoration - Mean squared error estimator for linear filters
Author(s): Masashi Sugiyama, Daisuke Imaizumi and Hidemitsu Ogawa
Contact person: Masashi Sugiyama (sugi@og.cs.titech.ac.jp)
Abstract: Most of the image restoration filters proposed so far include parameters. The values of the parameters should be determined so as to minimize a certain error measure such as the mean squared error (MSE) between restored and original images. However, it is not generally possible to do so since the original image itself is required for evaluating MSE. In this paper, we derive a criterion called the subspace information criterion (SIC) for linear filters. SIC gives an unbiased estimate of the expected MSE. By using SIC, we propose a procedure for optimizing the parameters of the moving-average filter, i.e., the window size and weight pattern. Simulation results show that SIC gives a good estimate of MSE in any cases, and the proposed procedure actually gives the optimal parameter which minimizes MSE.

TR00-0015 (November)

Title: Representation and learning of symbolic-statistical knowledge
Author(s): Yoshitaka Kameya
Contact person: Yoshitaka Kameya (kame@mi.cs.titech.ac.jp)
Abstract: To build an AI system dealing with uncertainty, one may first represent human knowledge as a symbolic-statistical model, and then utilize the model for statistical inference. To model more complex phenomena, Sato et al.'s logic programming language called PRISM is promising since it is a well-founded probabilistic extension of type-0 grammars in the Chomsky hierarchy, and therefore is a generalization of hidden Markov models (HMMs) or PCFGs. Unfortunately, however, its EM (expectation-maximization) algorithm lacks the efficiency even for simple models such as HMMs or PCFGs. The purpose of this paper is to provide an efficient method for EM learning of PRISM programs, which can be seen as a generalization of the specialized EM algorithms. For example, when a PCFG (an HMM) is written as a PRISM program, the method works as efficiently as the Inside-Outside (Baum-Welch) algorithm. In the method, we first generate a support graph, a graphical representation of logical formulas explaining the training examples, by Tamaki and Sato's OLDT. Then, a newly derived EM algorithm called the graphical EM algorithm runs on the support graph in the manner of dynamic programming. The paper analytically shows that the graphical EM algorithm finds a local maximum likelihood estimate, and that both of OLDT search and the graphical EM algorithm achieve the same computational order as that of the specialized algorithms. Furthermore, we have conducted an experiment for EM learning of PCFGs in which a generalized LR parser is used instead of the OLDT interpreter. For the ATR corpus and a hand-crafted Japanese grammar, our method drastically improves the efficiency of the Inside-Outside algorithm.