TR97-0001 (January)

Title
Learning music arrangements and their evaluation based on psychological experiments

Author(s)
Masayuki Numao, Masashi Kobayashi and Katsuyuki Sakaniwa

Contact person
Masayuki Numao (numao@cs.titech.ac.jp)

Abstract
We often decide things based on our feelings, which are implicit and very difficult to be written as knowledge. The paper reports an attempt to acquire feelings automatically. We assume that there are some relations or constraints between impressions felt and a situation, which consists of an object and its environment. For instance, in music arrangement, an object is a music score and its environment contains listeners, etc. Our project experimentally validates this assumption through three levels of experiments --- at the first level, a program only mimics human arrangements to transfer their impressions to another arrangement. This suggests that it distinguishes patterns that cause some impressions. At the second level, to acquire a music recognition model, a program finds relations and constraints between a music score and its impressions, by which we experimentally show that machine learning techniques will contribute to a powerful tool for composing music and for analyzing human feelings. Finally, we examine generality of the model by modifying some arrangements into those cause a specified impression.


TR97-0002 (Feburary)

Title
Some nonlinear lower bounds on the size of restricted circuits of Boolean functions

Author(s)
Tatsuie Tsukiji

Contact person
Tatsuie Tsukiji (tsukiji@cs.titech.ac.jp)

Abstract
A major goal of circuit complexity theory is to show that computational tasks that are believed to be hard are really hard. One reasonable way to show hardness of a given problem is to show that the size of circuits solving the problem become large; that is, the circuit complexity of the problem. The efforts towards showing lower bounds in circuit complexity theory has not been successful. If one impose strong limitations on the types of circuits, then one can show that a specific problem requires exponential number of gates to be solved. On the other hand, for general Boolean circuits, we do not know even non-linear lower bounds. This thesis is devoted to develop useful combinatorial techniques for showing circuit lower bounds, and apply them to get better lower bounds.

In particular, we pursues lower bounds of a Boolean sum, that is a set of disjunctions of monotone variables. For monotone circuit complexity, we improved the known lower bound of an almost explicit Boolean sum. More precisely, we define a Boolean sum by shifting a set of variables, and show that almost all of such Boolean sums require the monotone complexity $\Omega(n^2/2^{O((\log\log n)^2)})$.

We also consider a circuit model that allows NOT gates but that has some other restriction. A general circuit that computes a Boolean sum can be naturally regarded as an (2-fanin) $\brace{\cup,\cap}$-circuit that computes the corresponding family of sets of variables. Here we restrict the alternation of $\cup$ and $\cap$, and the domain of set operations in $\brace{\cup,\cap}$-circuits. More specifically, we consider the following restriction: the alternation of $\cup$ and $\cap$ to be at most $O(1)$, and the cardinality of the common part of the input sets for every gate is at most $o(n)$. Under this restriction, we obtain non-linear lower bounds of the family of sets associated with almost $\Omega(\log n)$-wise independent random variables.

We also show that the multiplication is really harder than the addition in input wise synchronous circuits. Circuit depth is another important measure of complexity, and one interesting question is the depth of randomly generated circuits. We analyze the structure of random circuits, and determine the depth a size $n$ circuit is $2e \log_e n$, under the uniform distribution over the standard description of circuits. (This is the author's Thesis for the Doctor of Engineering.)


TR97-0003 (April)

Title
On a positive instance generation for the 3-satisfiability (3SAT) problem}

Author(s)
Atsushi Mochizuki, Mitsuo Motoki and Osamu Watanabe

Contact person
Osamu Watanabe (watanabe@cs.titech.ac.jp)

Abstract
For the 3-Satisfiability Problem (3SAT), we consider a problem of generating its positive instance (and its solution) randomly. For such a problem, it is desirable in some cases if an instance generation algorithm produces only (and all) ``unique solution instances'', i.e., instances with only one solution. In this paper, we consider one simple instance generation algorithm, and investigate a condition that the algorithm generates unique solution instances when generating 3SAT instances with n variables and m clauses. We first prove that if m >=d1n2 (where d1 is some constant), then the algorithm generates unique solution instances with high probability. We also report some experimental result showing that if m >= d2n log n (where d2 is some constant), then the algorithm generates unique solution instances with high probability. (This report is written in Japanese.)


TR97-0004 (April)

Title
A new probabilistic LR language model for statistical parsing

Author(s)
Kentaro Inui, Virach Sornlertlamvanich, Hozumi Tanaka and Takenobu Tokunaga

Contact person
Kentaro Inui (inui@cs.titech.ac.jp)

Abstract
This paper presents a newly formalized probabilistic LR language model. Our model inherits its essential features from Briscoe and Carroll's generalized probabilistic LR (GPLR) model, which obtains context-sensitivity by assigning a probability to each LR parsing action according to its left and right context. However, our model is simpler while maintaining a higher degree of context-sensitivity as compared to Briscoe and Carroll's model. In this paper, we first formalize our GPLR model and enumerate some of its features. We then discuss the differences between Briscoe and Carroll's model and ours. We also qualitatively compare a model based on canonical LR with one based on lookahead LR.


TR97-0005 (May)

Title
Integrated probabilistic language modeling for statistical parsing

Author(s)
Kentaro Inui, Kiyoaki Shirai, Hozumi Tanaka and Takenobu Tokunaga

Contact person
Kentaro Inui (inui@cs.titech.ac.jp)

Abstract
This paper proposes a new framework of probabilistic language modeling that satisfies the two basic requirements of: (a) integration of part-of-speech n-gram statistics, structural preference and lexical sensitivity, and (b) maintenance of their modularity. Our framework consists of the syntactic model for part-of-speech bigram statistics and structural preference, and the lexical model for lexical sensitivity. As to the syntactic model, we argue that the LR-based language modeling, which was originally proposed by Briscoe and Carroll, and formalized later with several refinements by Inui et al., has good potential for its construction. The main issue we discuss in this paper is how to incorporate lexical sensitivity into the syntactic model while maintaining its modularity. We propose the new notions of lexical dependency parameters and lexical contexts, which capture lexical dependency. A lexical dependency parameter measures the degree of the dependency between the derivation of the word, and its lexical context. We discuss how to incorporate lexical dependency parameters into the overall probabilistic language model. Further, we demonstrate that one can use a unification-based grammar as a means of collecting the lexical contexts for each lexical derivation.


TR97-0006 (May)

Title
Self-definable claw free functions

Author(s)
Takeshi Koshiba and Osamu Watanabe

Contact person
Osamu Watanabe (watanabe@cs.titech.ac.jp)

Abstract
We propose a new type of claw free function family, a pseudo self-definable claw free function family, which provides, by Damgard's construction, a collision intractable hash function family that is more suitable in several cryptographic applications such as digital signature, bit commitment, etc. We give an example of self-definable claw free function families based on the hardness of some number theoretic problem. We also show some concrete situation where our claw free notion is indeed necessary.


TR97-0007 (May)

Title
Hard instance generation for SAT

Author(s)
Satoshi Horie and Osamu Watanabe

Contact person
Osamu Watanabe (watanabe@cs.titech.ac.jp)

Abstract
We consider the problem of generating hard instances for the Satisfying Assignment Search Problem (in short, SAT). It is not known whether SAT is difficult on average, while it has been believed that the Factorization Problem (in short, FACT) is hard on average. Thus, one can expect to generate hard-on-average instances by using a reduction from FACT to SAT. Although the asymptotically best reduction is obtained by using the Fast Fourier Transform (in short, FFT), its constant factor is too big in practice. Here we propose to use the Chinese Remainder Theorem for constructing efficient yet simple reductions from FACT to SAT. First by using the Chinese Remainder Theorem recursively, we define a reduction that produces, from n bit FACT instances, SAT instances in the conjunctive normal form with O(n1+\epsilon) variables, where \epsilon >0 is any fixed constant. (Cf. The reduction using FFT yields instances with O(n log n log log n) variables.) Next we demonstrate the efficiency of our approach with some concrete examples; we define a reduction that produces relatively small SAT instances. For example, it is possible to construct SAT instances with about 5,600 variables that is as hard as factorizing 100 bit integers. (Cf. The straightforward reduction yields SAT instances with 7,600 variables.)


TR97-0008 (November)

Title
Orientation selectivity problem: an approach from theoretical computer science

Author(s)
Osamu Watanabe and Tadashi Yamazaki

Contact person
Osamu Watanabe (watanabe@cs.titech.ac.jp)

Abstract
Neurobiologists have found, in the primary visual cortex of cat and monkey, some group of neurons that show ``orientation selectivity''. That is, each of such neurons reacts strongly to the presentation of light bars of a certain orientation. Furthermore, there is some evidence that such orientation selectivity is obtained in some animals through training during the very early stage after the birth. This process is different from ordinary learning processes because training data (or, stimuli) contain any particular pattern, i.e., they are random patterns. Several models have been proposed for explaining this process of developing orientation selectivity; in fact, some of such models have been investigated mathematically. Nevertheless, the problem has not been fully understood, and one can still look into it from different view points. In this paper, we discuss this problem from the view point of theoretical computer science.


TR97-0009 (November)

Title
Noise suppression in training data for improving generalization

Author(s)
Akiko Nakashima, Akira Hirabayashi and Hidemitsu Ogawa

Contact person
Akiko Nakashima (akko@cs.titech.ac.jp)

Abstract
Multi-layer feedforward neural networks are trained using the error back-propagation(BP) algorithm. This algorithm minimizes the error between outputs of a neural network(NN) and training data. Hence, in the case of noisy training data, a trained network memorizes noisy outputs for given inputs. Such learning is called rote memorization learning(RML). In this paper we propose error correcting memorization learning(CML). It can suppress noise in training data. In order to evaluate generalization ability of CML, it is compared with the projection learning (PL) criterion. It is theoretically proved that although CML merely suppresses noise in training data, it provides the same generalization as PL under some necessary and sufficient condition.


TR97-0010 (November)

Title
Error Correcting Memorization Learning for Noisy Training Data (in Japanese)

Author(s)
Akiko Nakashima, Akira Hirabayashi and Hidemitsu Ogawa

Contact person
Akiko Nakashima (akko@cs.titech.ac.jp)

Abstract
Multi-layer feedforward neural networks are trained using the error back-propagation algorithm. The algorithm minimizes the error between outputs of a neural network and training data. Hence, in the case of noisy training data, a trained network memorizes noisy outputs for given inputs. Such learning is called rote memorization learning. In this paper we propose error correcting memorization learning(CML) to suppress noise in training data. We clarify the mechanism of noise suppression of CML.


TR97-0011 (November)

Title
Noise Suppression of Training Data and Generalization Ability (in Japanese)

Author(s)
Akiko Nakashima and Hidemitsu Ogawa

Contact person
Akiko Nakashima (akko@cs.titech.ac.jp)

Abstract
Multi-layer feedforward neural networks are trained using the error back-propagation(BP) algorithm. The algorithm minimizes the training error. Hence, in the case of noisy training data, a trained network memorizes noisy outputs for given inputs. In order to suppress noise in training data, we proposed error correcting memorization learning(CML). In this paper, we evaluate generalization ability of CML comparing with the projection learning (PL). It is theoretically proved that although CML merely suppresses noise in training data, it provides the same generalization as PL under some necessary and sufficient condition.


TR97-0012 (December)

Title
Knowledge acquisition from complex domains by combining inductive learning and theory revision

Author(s)
Xiaolong Zhang and Masayuki Numao

Contact person
Xiaolong Zhang (zxl@cs.titech.ac.jp)

Abstract
In the process of knowledge acquisition, inductive learning and theory revision play important roles. Inductive learning is used to acquire new knowledge (theories) from training examples; and theory revision improves an initial theory with training examples. A theory preference criterion is critical in the processes of inductive learning and theory revision. A new system called knowar is developed by integrating inductive learning and theory revision. In addition, the theory preference criterion used in knowar is the combination of the MDL-based heuristic and the Laplace estimate. The system can be used to deal with complex problems. Empirical studies have confirmed that knowar leads to substantial improvement of a given initial theory in terms of its predictive accuracy.