TR94-0001
Text categorization based on weighted inverse document frequency
Takenobu Tokunaga and Makoto Iwayama
Contact person Takenobu Tokunaga (take@cs.titech.ac.jp)
Abstract
This paper proposes a new term weighting method called {\em weighted inverse document frequency\/} (WIDF). As its name indicates, WIDF is an extension of IDF (inverse document frequency) to incorporate the term frequency over the collection of texts. WIDF of a term in a text is given by dividing the frequency of the term in the text by the sum of the frequency of the term over the collection of texts. WIDF is applied to the text categorization task and proved to be superior to the other methods. The improvement of accuracy on IDF is 7.4\% at the maximum.
TR94-0002
Manpower shift assignment: algorithms and complexity (Part II)
Hoong Chuin Lau
Contact person Hoong Chuin Lau (hclau@cs.titech.ac.jp)
Abstract
In Part I of this paper, we presented the NP-hardness of the Constrained Manpower Shift Assignment Problem (CSAP), and classified it into 8 variants. We presented a greedy algorithm to solve two of the variants. In this sequel paper, we present graph-theoretic and backtracking algorithms to solve the remaining six variants of CSAP.
TR94-0003
Modeling students by using machine learning techniques (in Japanese)
Yuko Wakatsuki and Masayuki Numao
Contact person Masayuki Numao (numao@cs.titech.ac.jp)
Abstract
The paper proposes some machine learning techniques to synthesize logic programs, and describes their applications to an intelligent tutoring system of Prolog.
TR94-0004
Experimental prolog textbooks (in Japanese)
Noriko Otani and Masayuki Numao
Contact person Masayuki Numao (numao@cs.titech.ac.jp)
Abstract
This report contains three Prolog textbooks used in a psychological experiment, by which the authors evaluated the effect of imagery in the process of learning Prolog.
TR94-0005
A class of optimal unequal byte error protection codes --- SEC-DED-F$b$ED Codes ---
Eiji Fujiwara and Toshiaki Kusakabe
Contact person Eiji Fujiwara (fujiwara@cs.titech.ac.jp)
Abstract
Unequal Error Protection Codes, or UEP Codes, protect each bit or byte in a word from errors based on its importance, value, or reliability. This paper proposes a class of UEP codes which protects especially a fixed $b$-bit byte in a word storing, for example, control data, or pointer data in computer systems. Errors in this byte, therefore, give serious influence to the subsequent operations. The proposed codes give an $SEC$ or $SEC-DED$ capability to all bits in a word, and in addition to this give byte error detection capability, that is, $SEC-DED-FbED$ function, especially to this fixed byte having $b$-bit length. This paper clarifies bounds on code length of this class of codes, and finally indicates that the proposed codes are optimal.
TR94-0006
Error control scheme for universal data compression
Takeshi Nagai and Eiji Fujiwara
Contact person Eiji Fujiwara (fujiwara@cs.titech.ac.jp)
Abstract
Recently, universal data compression has well been used in computer systems. This compression method has the feature that the error occurred particularly in the former part of the compressed data influences the remaining part strongly. This paper proposes a new error control scheme which has stronger protection for errors in the former part than in the back part. The proposed coding scheme has multiple nested structure which protects the errors occurred in the former part strongly. This can correct multiple error which couldn't be corrected by hte ordinary disk memory codes. With using the existing disk memory codes, the proposed coding scheme necessitates fewer check bytes, added to the existing codes.
TR94-0007
A class of unidirectional byte error locating codes with single symmetric bit error correction capability
Shuxin Jiang and Eiji Fujiwara
Contact person Eiji Fujiwara (fujiwara@cs.titech.ac.jp)
Abstract
This paper proposes a new class of unidirectional byte error locating codes, called single symmetric bit error correcting and single unidirectional byte error locating codes, or SEC-SU\(b\)EL codes. Here, ``byte'' denotes a cluster of \(b\) bits, where \(b \geq 2\). First, the necessary and sufficient conditions of the codes are clarified, and then code construction algorithm is demonstrated. The lower bound on check bit length of the SEC-SU{\it b}EL codes is derived. Based on this, the proposed codes are shown to be very efficient in some range of the information length. The code design concept presented for the SEC-SU\(b\)EL codes induces the generalized unidirectional byte error locating codes with single symmetric bit error correction capability.
TR94-0008
A probabilistic model for text categorization: Based on a single random variable with multiple values
Makoto Iwayama and Takenobu Tokunaga
Contact person Takenobu Tokunaga (take@cs.titech.ac.jp)
Abstract
Text categorization is the classification of documents with respect to a set of predefined categories. In this paper, we propose a new probabilistic model for text categorization, that is based on a {\em Single random Variable with Multiple Values\/} (SVMV). Compared to previous probabilistic models, our model has the following advantages; 1) it considers within-document term frequencies, 2) considers term weighting for target documents, and 3) is not affected by having insufficient training cases. We verify our model's superiority over the others in the task of categorizing news articles from the ``Wall Street Journal.''
TR94-0009
On a small class of Boolean sums
Tatsuie Tsukiji
Contact person Tatsuie Tsukiji (tsukiji@cs.titech.ac.jp)
Abstract
We define a small class of Boolean sums with n outputs,and obtain a lower bound on monotone circuit size close to the square of n for almost all functions in the class. A linear-space computable function in the class is obtained which have the same lower bound.
TR94-0010
Selector elimination for term rewriting systems
Yutaka Kikuchi
Contact person Yutaka Kikuchi (kikuchi@cs.titech.ac.jp)
Abstract
The function symbols of a term rewriting system (TRS) can be classified into three categories, namely, constructors, selectors, and defined operators in terms of functional programming. The constructors and the defined operators are sufficient to construct data structures and to manipulate them. The selectors are for convenience of programming. We give a transformation method to eliminate selectors from a TRS. We also introduce a new model of a TRS from the viewpoint of functional programming, and show that the original TRS and the transformed TRS have the almost same model except trivial differences. Moreover this transformation preserves strong termination property. (The paper is written in Japanese.)
TR94-0011
On generalization of attribute grammars
Yutaka Kikuchi
Contact person Yutaka Kikuchi (kikuchi@cs.titech.ac.jp)
Abstract
We propose general attribute grammars that are a generalization of attribute grammars from the following two viewpoints: \begin{enumerate} \item we adopt type 0 grammars as underlying grammars; \item we allow any predicate as a semantic rule. \end{enumerate} Therefore we regard general attribute grammars as constraint satisfaction systems. This paper provides a formal definition and classifications of general attribute grammars. We define the semantics of a general attribute grammar as a semantic function whose inputs are derivation trees of the underlying grammar and whose outputs are attributed trees. We give classifications of general attributed grammars based on abstract properties of semantic functions. (The paper is written in Japanese.)
TR94-0012
An attribute grammatical computation model on extended base grammar
Yutaka Kikuchi
Contact person Yutaka Kikuchi (kikuchi@cs.titech.ac.jp)
Abstract
We propose {\sf scHFP}, which is a computation model based on attribute grammars. We adopt type 0 grammars as underlying grammars. In this model, semantic rules are equivalence relations between attribute occurrences. The {\sf scHFP} model has capability of denoting any recursive functions, while the {\bf HFP} model does not have. We implement a {\sf scHFP} attribute evaluator based on graph rewriting systems. The evaluator supports lazy evaluation and avoids redundant copy of substructures of the derivation tree.
TR94-0013
Verification of bounded delay asynchronous circuits with timed traces
Tomohiro Yoneda, Ichiki Honma and Bernd--Holger Schlingloff
Contact person Tomohiro Yoneda (yoneda@cs.titech.ac.jp)
Abstract
We extend the verification method based on trace theory by Dill et al. such that it can handle bounded delay asynchronous circuits and check certain liveness properties as well as safety properties
TR94-0014
Local search for CSP and its applications to rescheduling and graph-coloring
Hoong Chuin Lau
Contact person Hoong Chuin Lau (hclau@cs.titech.ac.jp)
Abstract
A constraint satisfaction problem (CSP) consists of a set of variables, their associated domains and a set of constraints on these variables. The objective is to find an assignment of variables that satisfies the constraints. We propose a simple local search algorithm to solve random instances of CSPs and prove that it returns a solution almost surely when there are sufficiently many constraints and the initial assignment is sufficiently near a solution. Next, we show that local search gives a good approximation algorithm for solving the maximum constraint satisfaction problem. Our results have appealing applications in re-scheduling and graph-coloring.
TR94-0015
Timing-reliability evaluation of asynchronous circuits based on different delay models
Masashi Kuwako and Takashi Nanya
Contact person Takashi Nanya (nanya@cs.titech.ac.jp)
Abstract
We propose a quantitative measure for evaluating the timing-reliability of asynchronous circuits designed on a variety of delay models. Usig the measure, we evaluate the timing-reliability, as well as the speed performance and hardware cost, for various building blocks of asynchronous systems. Finally, we give a guideline for choosing valid delay models for the design of dependable and high-performance asynchronous processors.
TR94-0016
Analysis of Japanese compound nouns using collocational information
Kobayasi Yosiyuki, Tokunaga Takenobu and Tanaka Hozumi
Contact person Tokunaga Takenobu (take@cs.titech.ac.jp)
Abstract
Analyzing compound nouns is one of the crucial issues for natural language processing systems, in particular for those systems that aim at a wide coverage of domains. In this paper, we propose a method to analyze structures of Japanese compound nouns by using both word collocations statistics and a thesaurus. An experiment is conducted with 160,000 word collocations to analyze compound nouns of with an average length of 4.9 characters. The accuracy of this method is about 80\%.
TR94-0017
Incorporation of phoneme-context-dependence in LR table through constraint propagation method
Hozumi Tanaka, Hui Li and Takenobu Tokunaga
Contact person Hui Li (li@cs.titech.ac.jp)
Abstract
It is obvious that successful speech recognition requires the use of linguistic information. For this purpose, a generalized LR (GLR) parser provides an exceptionally competent and flexible framework to combine linguistic information with phonological information. The combination of a GLR parser and allophone models is considered very effective for enhancing the recognition accuracy in a large vocabulary continuous speech recognition. The main problem of integrating GLR parsing into an allophone-based recognition system is how to solve the word juncture problem, that is, how to express the phones at a word boundary with allophone models. This paper proposes a new method called CPM ( Constraint Propagation Method ) to generate an allophone-based LR table, which can effectively solve the word juncture problem. In our method, by introducing the allophone rules into the CFG and lexical rules, an LR table is generated, then the LR table is modified on the basis of an allophone connection matrix by applying the constraint propagation method. With this modified LR table, precise allophone predictions for speech recognition can be obtained.
TR94-0018
A bayesian approach for user modeling in dialogue systems
Tomoyosi Akiba and Hozumi Tanaka
Contact person Takenobu Tokunaga (take@cs.titech.ac.jp)
Abstract
User modeling is an important component of dialog systems. Most previous approaches are rule-based methods. In this paper, we propose to represent user models through Bayesian networks. Some advantages of the Bayesian approach over the rule-based approach are as follows. First, rules for updating user models are not necessary because updating is directly performed by the evaluation of the network based on probability theory; this provides us a more formal way of dealing with uncertainties. Second, the Bayesian network provides more detailed information of users' knowledge, because the degree of belief on each concept is provided in terms of probability. We prove these advantages through a preliminary experiment.
TR94-0019
An extension of langlab for japanese morphological analysis
Tomoyosi Akiba, Takenobu Tokunaga and Hozumi Tanaka
Contact person Takenobu Tokunaga (take@cs.titech.ac.jp)
Abstract
LangLAB is a natural language analysing system which adopts a bottom-up, depth-first parsing strategy. LangLAB has useful features to analyse English text. This paper describes an extension of the LangLAB system in order to analyse Japanese. In particular, a morphological analysis program of Japanese is proposed. The representation form of grammar rules (DCG) and dictionary description (DRF) are also extended to handle the morphological information.
TR94-0020
A direct verification of CSC property of STG/FCs for asynchronous circuit design
Sung-Bum Park and Takashi Nanya
Contact person Takashi Nanya (nanya@cs.titech.ac.jp)
Abstract
We propose a method for verifying the CSC property of signal transition graph (STG) representation for asynchronous circuit design. The underlying STG can have multi-cycle signals, free-choice operations and parallel operations. We define the boomerang sequence to detect pairs of states having the same state coding, and develop a procedure for verifying the CSC property. Our method has been applied to many examples and has a polynomial complexity.
TR94-0021
Inductive logic programming as constrained search
Chowdhury Rahman Mofizur and Masayuki Numao
Contact person Chowdhury Rahman Mofizur (rahman@cs.titech.ac.jp)
Abstract
This report presents a system for constructing first order Horn clause theories from examples and necessary background knowledge. The implemented system ILPCS has incorporated some innovative natural constraints in order to make the search space tractable which have been overlooked in existing implementations. The detection of redundant information in the phase of clause development and constraining the information explosion are the backbones of this system. It has been verified experimentally that the set of learnable problems by ILPCS includes the set of benchmark problems learnable by the existing ILP systems with fewer examples and less computation efforts.
TR94-0022
A synthesis method of quasi-delay-insensitive processors based on dependency graphs
Hiroto Kagotani and Takashi Nanya
Contact person Takashi Nanya (nanya@cs.titech.ac.jp)
Abstract
We propose a synthesis method of quasi-delay-insensitive processors that compute and transfer data in the two-phase manner. Specifications are described in Dependency Graphs, which represent conditional branches and loops as well as dependency relations between micro-operations. This synthesis method implements controllers directly from Dependency Graphs by replacing every graph node with the corresponding circuit block. The use of the proposed Auto-Sweeping Module overcomes the problem of two-phase control, where idle-phases consumed almost half of total execution time.
TR94-0023
Constraint problems on attributed trees and their applications to knowledge maintenance
Takehiro Tokuda, Yoshimichi Watanabe, Nobuo Kakinuma and Kazuto Tominaga
Contact person Yoshimichi Watanabe (nabe@cs.titech.ac.jp)
Abstract
We first show that a certain type of knowledge base update problems can be regarded as constraint problems. We then consider constraint problems on attributed trees which can be solved by local propagation of values. Finally we show four new constraint solving methods: DeltaUp method, tree walk method, set manipulation method and attribute grammar method. These methods allow us to have an incremental solution of constraint problems.
TR94-0024
A transformation method of attribute grammars into efficient action routines using attribute value estimate
Yoshimichi Watanabe and Takehiro Tokuda
Contact person Yoshimichi Watanabe (nabe@cs.titech.ac.jp)
Abstract
This paper presents a transformation method of attribute grammars into efficient action routines by using inherited attribute estimate. An attribute evaluator based on one-pass bottom-up parser cannot generally evaluate inherited attributes of a given attribute grammar during parsing. To solve the problem, we introduce global variables for a special type of inherited attributes, which we call constant-propagation type inherited attributes. The values of the inherited attributes are determinate by constant assignments or copy functions from the same inherited attributes or synthesized attributes. Our transformation can be applied to the subclass of L-attributed grammars. Using our transformation, we can obtain efficient action routines from attribute grammars of synthesized attributes and constant-propagation type inherited attributes.
TR94-0025
Treating inherited attributes of L-attributed grammars as synthesized attributes
Takehiro Tokuda
Contact person Yoshimichi Watanabe (nabe@cs.titech.ac.jp)
Abstract
We present two methods of treating a subclass of L-attributed grammars as synthesized attribute grammars. The first method performs one-pass bottom-up parsing of a given input defined by an L-attributed grammar and evaluates all inherited and synthesized attributes by manipulation of finitely-manipulatable attributed item sets. The second method transforms an L-attributed grammar into an equivalent synthesized attribute grammar by manipulation of one-dimensional vector-valued synthesized attributes.
TR94-0026
Eliminating unnecessary items from the one-pass evaluation of attribute grammars
Yoshimichi Watanabe and Takehiro Tokuda
Contact person Yoshimichi Watanabe (nabe@cs.titech.ac.jp)
Abstract
We present two efficient attribute evaluator construction methods for a wide subclass of L-attributed grammars by enumeration of attributed items during one-pass left-to-right parsing. We have already proposed a construction method of a parser/evaluator for the subclass of L-attributed grammar. However, the evaluator produced by our previous method uses a great number of attributed items to evaluate all attributes of a given input string. In this paper we propose two generalized methods to eliminate unnecessary attributed items enumerated in our previous method. Our methods allow us to evaluate all attributes taking advantage of the use of available lookahead information.
TR94-0027
Reducing the number of items from LR-attributed grammar evaluators
Takehiro Tokuda
Contact person Yoshimichi Watanabe (nabe@cs.titech.ac.jp)
Abstract
We have presented methods to reduce the number of attributed LR items from one-pass LR-attributed grammar evaluators using top-down and bottom-up compatibility test of lookahead information.
TR94-0028
GENESYS: An integrated environment for developing Systemic Functional Grammars
Tadashi Kumano, Takenobu Tokunaga, Kentaro Inui and Hozumi Tanaka
Contact person Takenobu Tokunaga (take@cs.titech.ac.jp)
Abstract
In order to develop rich language resources systematically and efficiently, we need not only well-founded linguistic theories but also software tools that facilitate writing and examining them. This paper reports on the GENESYS system, which provides an integrated environment for developing Systemic Functional Grammars (SFGs). GENESYS has a special editor for writing SFGs. Further, the user can examine grammars by running the GENESYS' surface generator. The information of the intermediate states of the generation process can be monitored through the graphical user interface.
TR94-0029
A software generator system based on timed attribute hypergraph grammars
Mikifumi Shikida, Yasuhide Yamamoto and Takehiro Tokuda
Contact person Mikifumi Shikida (shikida@cs.titech.ac.jp)
Abstract
We propose Timed Attribute Hypergraph Grammars (TAHG) and a software generator system based on the grammar. A TAHG model consists of a syntax definition, a semantic definition, and object definitions. In a syntax definition, a designer describes the syntax definition using a parameterized context-free grammar, which we call a hypergraph grammar. In a semantic definition, the designer describes dynamic behavior of target objects using a generalization of attribute grammars. In object definitions, the designer describes views of objects and interactions with a user. Our TAHG-model based software generator allows us to produce an interactive software system, which can deal with graph-structure based two-dimensional information processing systems such as Petri net editors and logic circuit editors.
TR94-0030
A super LR($0$) parsing of LR($k$) grammars
Takehiro Tokuda
Contact person Yoshimichi Watanabe (nabe@cs.titech.ac.jp)
Abstract
We present efficient methods for LR($0$) item-based parsing of LR($k$) grammars using bottom-up computation of lookahead information. We give four methods to four cases of LR($k$) grammars for $k \geq 1$. The first case is an $\varepsilon$-free LR($1$) grammar with uniform reductions. The second case is a general $\varepsilon$-free LR($1$) grammar. The third case is a general LR$(1$) grammar. The fourth case is a general LR($k$) grammar for $k \geq 2$. Our method has a feature that, when we increase the lookahead length, the cardinality of the set of LR($0$) items decreases monotonically.
TR94-0031
A parsing method for context-free languages using bottom-up lookahead computation
Yoshimichi Watanabe and Takehiro Tokuda
Contact person Yoshimichi Watanabe (nabe@cs.titech.ac.jp)
Abstract
We present a new parsing algorithm of context-free languages using bottom-up computation of lookahead information. Earley's algorithm uses a great number of items during enumeration to recognize a context-free language. If we use items with lookahead fields, then we can reduce the number of items, but we inevitably increase the space of items considerably. Our method may reduce the number of items during enumeration without using lookahead fields of items. Our computation of lookahead information takes place in a bottom-up manner. Also we do not have to compute lookahead information until it is necessary during enumeration.
TR94-0032
A realization method of optimally generalizing neural network based on error minimization
Hidemitsu Ogawa and Jun-ichi Funada
Contact person Yukihiko Yamashita (yamasita@cs.titech.ac.jp)
Abstract
The task required of a Multilayer Perceptron can be summarised as an optimal approximation of the original function from the set of sampled data, incorporating the concept of generalization ability. Based on previous research, it has been shown that even if the original function is not known, it's best approximation can be computed using the technique namely Projection Generalizing Neural Network(PGNN). However, using the above technique, we obtain a family of NNs realizing the same goal. In this paper, an effort has been made to exploit this existing degree of freedom. While maintaining the optimal generalizing abilty of the net , a method of generating a NN which is least susceptible to errors in the connection weights is theorized.
TR94-0033
A unified theory of the family of projection filters for signal and image estimation
Yuji Koide, Yukihiko Yamashita and Hidemitsu Ogawa
Contact person Yukihiko Yamashita (yamasita@cs.titech.ac.jp)
Abstract
The general forms of the projection filter, the partial projection filter, and the average projection filter for image estimation have been obtained by solving their respective systems of operator equations. Ogawa proposed a unified system which could provide the three forms simultaneously. But the obtained form became complicated compared with the original forms. In order to solve this problem, we propose another unified system which can give almost the same forms as originals. Furthermore, we provide the properties which are common to the family of the projection filters together.
TR94-0034
Recent topics on coding theory for computers
Eiji Fujiwara
Contact person Eiji Fujiwara (fujiwara@cs.titech.ac.jp)
Abstract
Coding theory is actively applied to computer systems, communication systems and AV systems in order to improve their reliability. Among them, it has been most successfully applied to computer systems, especially to memory systems such as semiconductor memories and file memories. This paper mentions about present state of an application of coding theory to computer systems, and introduces some codes recently developed which have attracted the attention. Furthermore, it discusses on the subject for a future study.
TR94-0035
Towards average-case complexity analysis of NP optimization problems
Rainer Schuler and Osamu Watanabe
Contact person Osamu Watanabe (watanabe@cs.titech.ac.jp)
Abstract
For the worst-case complexity measure, if P $=$ NP, then P $=$ OptP, i.e., all NP optimization problems are polynomial-time solvable. On the other hand, it is not clear whether a similar relation holds when considering average-case complexity. We investigate the relationship between the complexity of NP decision problems and that of NP optimization problems under polynomial-time computable distributions, and study what makes them (seemingly) different. It is shown that the difference between P$^{\rm NP}_{\rm tt}$-samplable and P$^{\rm NP}$-samplable distributions is crucial.
TR94-0036
Probabilistic analysis of local search for constraint satisfaction and its application to rescheduling
Hoong Chuin Lau
Contact person Hoong Chuin Lau (hclau@cs.titech.ac.jp)
Abstract
The Constraint Satisfaction Problem (CSP) is known to be NP-hard in general. We present two new theoretical and experimental results for solving random CSP instances based on local search (iterative repair). First, we consider a relaxed version of the CSP: suppose we have an initial assignment that is reasonably close to a solution, can we find a solution in polynomial time? This consideration has practical implications in the rescheduling of resources especially on a factory floor. We propose a simple local search algorithm and show probabilistically that, for instances that have unique solutions, our local search procedure returns a solution almost surely when the Hamming distance between the initial assignment and the solution is less than $n/2$, where $n$ is the number of variables. Next, we show that iterative application of local search will yield an assignment which satisfies significantly more constraints than a random assignment. This observation leads us to give provable bounds on the probability of constraint consistency for which iterative local search will almost surely yield a solution in polynomial time. These results are consistent with experiments.