TR94-0001
Text categorization based on weighted inverse document frequency
Takenobu Tokunaga and Makoto Iwayama
Contact person
Takenobu Tokunaga
(take@cs.titech.ac.jp)
Abstract
This paper proposes a new term weighting method called {\em weighted
inverse document frequency\/} (WIDF). As its name indicates, WIDF is
an extension of IDF (inverse document frequency) to incorporate the
term frequency over the collection of texts. WIDF of a term in a
text is given by dividing the frequency of the term in the text by
the sum of the frequency of the term over the collection of texts.
WIDF is applied to the text categorization task and proved to be
superior to the other methods. The improvement of accuracy on IDF
is 7.4\% at the maximum.
TR94-0002
Manpower shift assignment: algorithms and complexity (Part II)
Hoong Chuin Lau
Contact person
Hoong Chuin Lau
(hclau@cs.titech.ac.jp)
Abstract
In Part I of this paper, we presented the NP-hardness of the Constrained
Manpower Shift Assignment Problem (CSAP), and classified it into 8 variants.
We presented a greedy algorithm to solve two of the variants.
In this sequel paper, we present graph-theoretic and
backtracking algorithms to solve the remaining six variants of CSAP.
TR94-0003
Modeling students by using machine learning techniques (in Japanese)
Yuko Wakatsuki and Masayuki Numao
Contact person
Masayuki Numao
(numao@cs.titech.ac.jp)
Abstract
The paper proposes some machine learning techniques to synthesize logic
programs, and describes their applications to an intelligent tutoring system
of Prolog.
TR94-0004
Experimental prolog textbooks (in Japanese)
Noriko Otani and Masayuki Numao
Contact person
Masayuki Numao
(numao@cs.titech.ac.jp)
Abstract
This report contains three Prolog textbooks used in a psychological
experiment, by which the authors evaluated the effect of imagery in the
process of learning Prolog.
TR94-0005
A class of optimal unequal byte error protection codes
--- SEC-DED-F$b$ED Codes ---
Eiji Fujiwara and Toshiaki Kusakabe
Contact person
Eiji Fujiwara
(fujiwara@cs.titech.ac.jp)
Abstract
Unequal Error Protection Codes, or UEP Codes, protect each bit or byte
in a word from errors based on its importance, value, or reliability.
This paper proposes a class of UEP codes which protects especially a
fixed $b$-bit byte in a word storing, for example, control data, or pointer
data in computer systems.
Errors in this byte, therefore, give serious influence to the subsequent
operations.
The proposed codes give an $SEC$ or $SEC-DED$ capability to all bits in
a word, and in addition to this give byte error detection capability,
that is, $SEC-DED-FbED$ function, especially to this fixed byte having
$b$-bit length.
This paper clarifies bounds on code length of this class of codes, and
finally indicates that the proposed codes are optimal.
TR94-0006
Error control scheme for universal data compression
Takeshi Nagai and Eiji Fujiwara
Contact person
Eiji Fujiwara
(fujiwara@cs.titech.ac.jp)
Abstract
Recently, universal data compression has well been used in computer systems.
This compression method has the feature that the error occurred particularly in the former part of the compressed data influences the remaining part strongly.
This paper proposes a new error control scheme which has stronger protection for errors in the former part than in the back part.
The proposed coding scheme has multiple nested structure which protects the errors occurred in the former part strongly.
This can correct multiple error which couldn't be corrected by hte ordinary disk memory codes.
With using the existing disk memory codes, the proposed coding scheme necessitates fewer check bytes, added to the existing codes.
TR94-0007
A class of unidirectional byte error locating codes
with single symmetric bit error correction capability
Shuxin Jiang and Eiji Fujiwara
Contact person
Eiji Fujiwara
(fujiwara@cs.titech.ac.jp)
Abstract
This paper proposes a new class of unidirectional byte error locating codes,
called single symmetric bit error correcting and single unidirectional byte
error locating codes, or SEC-SU\(b\)EL codes. Here, ``byte'' denotes a cluster
of \(b\) bits, where \(b \geq 2\). First, the necessary and sufficient
conditions of the codes are clarified, and then code construction algorithm
is demonstrated. The lower bound on check bit length of the SEC-SU{\it b}EL
codes is derived. Based on this, the proposed codes are shown to be very
efficient in some range of the information length. The code design concept
presented for the SEC-SU\(b\)EL codes induces the generalized unidirectional
byte error locating codes with single symmetric bit error correction
capability.
TR94-0008
A probabilistic model for text categorization:
Based on a single random variable with multiple values
Makoto Iwayama and Takenobu Tokunaga
Contact person
Takenobu Tokunaga
(take@cs.titech.ac.jp)
Abstract
Text categorization is the classification of documents with respect to
a set of predefined categories. In this paper, we propose a new
probabilistic model for text categorization, that is based on a {\em
Single random Variable with Multiple Values\/} (SVMV). Compared to
previous probabilistic models, our model has the following advantages;
1) it considers within-document term frequencies, 2) considers term
weighting for target documents, and 3) is not affected by having
insufficient training cases. We verify our model's superiority over
the others in the task of categorizing news articles from the ``Wall
Street Journal.''
TR94-0009
On a small class of Boolean sums
Tatsuie Tsukiji
Contact person
Tatsuie Tsukiji
(tsukiji@cs.titech.ac.jp)
Abstract
We define a small class of Boolean sums
with n outputs,and obtain a lower bound
on monotone circuit size close to the
square of n for almost all functions in
the class. A linear-space computable
function in the class is obtained which
have the same lower bound.
TR94-0010
Selector elimination for term rewriting systems
Yutaka Kikuchi
Contact person
Yutaka Kikuchi
(kikuchi@cs.titech.ac.jp)
Abstract
The function symbols of a term rewriting system (TRS) can be
classified into three categories,
namely, constructors, selectors, and defined operators
in terms of functional programming.
The constructors and the defined operators are sufficient to construct
data structures and to manipulate them.
The selectors are for convenience of programming.
We give a transformation method to eliminate selectors from a TRS.
We also introduce a new model of a TRS
from the viewpoint of functional programming,
and show that the original TRS and the transformed TRS
have the almost same model except trivial differences.
Moreover this transformation preserves strong termination property.
(The paper is written in Japanese.)
TR94-0011
On generalization of attribute grammars
Yutaka Kikuchi
Contact person
Yutaka Kikuchi
(kikuchi@cs.titech.ac.jp)
Abstract
We propose general attribute grammars that are a generalization of
attribute grammars from the following two viewpoints:
\begin{enumerate}
\item we adopt type 0 grammars as underlying grammars;
\item we allow any predicate as a semantic rule.
\end{enumerate}
Therefore we regard general attribute grammars
as constraint satisfaction systems.
This paper provides a formal definition and classifications of
general attribute grammars. We define the semantics of a general
attribute grammar as a semantic function whose inputs are derivation
trees of the underlying grammar and whose outputs are attributed
trees. We give classifications of general attributed grammars based
on abstract properties of semantic functions.
(The paper is written in Japanese.)
TR94-0012
An attribute grammatical computation model on extended base grammar
Yutaka Kikuchi
Contact person
Yutaka Kikuchi
(kikuchi@cs.titech.ac.jp)
Abstract
We propose {\sf scHFP}, which is a computation model based on attribute
grammars. We adopt type 0 grammars as underlying grammars.
In this model, semantic rules are equivalence relations between
attribute occurrences. The {\sf scHFP} model has capability of denoting
any recursive functions, while the {\bf HFP} model does not have.
We implement a {\sf scHFP} attribute evaluator based on graph rewriting
systems. The evaluator supports lazy evaluation and avoids
redundant copy of substructures of the derivation tree.
TR94-0013
Verification of bounded delay asynchronous circuits with timed traces
Tomohiro Yoneda, Ichiki Honma and Bernd--Holger Schlingloff
Contact person
Tomohiro Yoneda
(yoneda@cs.titech.ac.jp)
Abstract
We extend the verification method based on trace theory
by Dill et al. such that it can handle bounded delay asynchronous
circuits and check certain liveness properties as well as safety
properties
TR94-0014
Local search for CSP and its applications to rescheduling and graph-coloring
Hoong Chuin Lau
Contact person
Hoong Chuin Lau
(hclau@cs.titech.ac.jp)
Abstract
A constraint satisfaction problem (CSP) consists of a set of variables, their
associated domains and a set of constraints on these variables. The objective
is to find an assignment of variables that satisfies the constraints. We
propose a simple local search algorithm
to solve random instances of CSPs and prove
that it returns a solution almost surely when
there are sufficiently many constraints and
the initial assignment is sufficiently near a solution.
Next, we show that local search gives a good approximation
algorithm for solving the maximum constraint satisfaction problem.
Our results have appealing applications in re-scheduling and graph-coloring.
TR94-0015
Timing-reliability evaluation of asynchronous circuits based on different
delay models
Masashi Kuwako and Takashi Nanya
Contact person
Takashi Nanya
(nanya@cs.titech.ac.jp)
Abstract
We propose a quantitative measure for evaluating the timing-reliability of
asynchronous circuits designed on a variety of delay models. Usig the
measure, we evaluate the timing-reliability, as well as the speed
performance and hardware cost, for various building blocks of asynchronous
systems. Finally, we give a guideline for choosing valid delay models for
the design of dependable and high-performance asynchronous processors.
TR94-0016
Analysis of Japanese compound nouns using collocational information
Kobayasi Yosiyuki, Tokunaga Takenobu and Tanaka Hozumi
Contact person
Tokunaga Takenobu
(take@cs.titech.ac.jp)
Abstract
Analyzing compound nouns is one of the crucial issues for natural
language processing systems, in particular for those systems that aim
at a wide coverage of domains. In this paper, we propose a method to
analyze structures of Japanese compound nouns by using both word
collocations statistics and a thesaurus. An experiment is conducted
with 160,000 word collocations to analyze compound nouns of with an
average length of 4.9 characters. The accuracy of this method is about
80\%.
TR94-0017
Incorporation of phoneme-context-dependence in LR table through
constraint propagation method
Hozumi Tanaka, Hui Li and Takenobu Tokunaga
Contact person
Hui Li
(li@cs.titech.ac.jp)
Abstract
It is obvious that successful speech recognition requires the use of
linguistic information. For this purpose, a generalized LR (GLR)
parser provides an exceptionally competent and flexible framework to
combine linguistic information with phonological information.
The combination of a GLR parser and allophone models is considered
very effective for enhancing the recognition accuracy in a large
vocabulary continuous speech recognition. The main problem of
integrating GLR parsing into an allophone-based recognition system is
how to solve the word juncture problem, that is, how to express
the phones at a word boundary with allophone models.
This paper proposes a new method called CPM ( Constraint Propagation
Method ) to generate an allophone-based LR table, which can
effectively solve the word juncture problem. In our method, by
introducing the allophone rules into the CFG and lexical rules, an
LR table is generated, then the LR table is modified on the
basis of an allophone connection matrix by applying the constraint
propagation method. With this modified LR table, precise allophone
predictions for speech recognition can be obtained.
TR94-0018
A bayesian approach for user modeling in dialogue systems
Tomoyosi Akiba and Hozumi Tanaka
Contact person
Takenobu Tokunaga
(take@cs.titech.ac.jp)
Abstract
User modeling is an important component of dialog systems. Most
previous approaches are rule-based methods. In this paper, we propose
to represent user models through Bayesian networks. Some advantages of
the Bayesian approach over the rule-based approach are as follows.
First, rules for updating user models are not necessary because
updating is directly performed by the evaluation of the network based
on probability theory; this provides us a more formal way of dealing
with uncertainties. Second, the Bayesian network provides more
detailed information of users' knowledge, because the degree of belief
on each concept is provided in terms of probability. We prove these
advantages through a preliminary experiment.
TR94-0019
An extension of langlab for japanese morphological analysis
Tomoyosi Akiba, Takenobu Tokunaga and Hozumi Tanaka
Contact person
Takenobu Tokunaga
(take@cs.titech.ac.jp)
Abstract
LangLAB is a natural language analysing system which adopts a
bottom-up, depth-first parsing strategy. LangLAB has useful features
to analyse English text. This paper describes an extension of the
LangLAB system in order to analyse Japanese. In particular, a
morphological analysis program of Japanese is proposed. The
representation form of grammar rules (DCG) and dictionary description
(DRF) are also extended to handle the morphological information.
TR94-0020
A direct verification of CSC property of STG/FCs for asynchronous circuit design
Sung-Bum Park and Takashi Nanya
Contact person
Takashi Nanya
(nanya@cs.titech.ac.jp)
Abstract
We propose a method for verifying the CSC property of signal transition
graph (STG) representation for asynchronous circuit design. The
underlying STG can have multi-cycle signals, free-choice operations and
parallel operations. We define the boomerang sequence to detect pairs of
states having the same state coding, and develop a procedure for verifying
the CSC property. Our method has been applied to many examples and has a
polynomial complexity.
TR94-0021
Inductive logic programming as constrained search
Chowdhury Rahman Mofizur and Masayuki Numao
Contact person
Chowdhury Rahman Mofizur
(rahman@cs.titech.ac.jp)
Abstract
This report presents a system for constructing first order
Horn clause theories from examples and necessary background knowledge. The
implemented system ILPCS has incorporated some innovative natural
constraints in order to make the search space tractable which
have been overlooked in existing implementations. The detection of
redundant information in the phase of clause development and
constraining the information explosion are the backbones of this system. It
has been verified experimentally that the set of learnable problems by ILPCS
includes the set of benchmark problems learnable by the existing
ILP systems with fewer examples and less computation efforts.
TR94-0022
A synthesis method of quasi-delay-insensitive processors
based on dependency graphs
Hiroto Kagotani and Takashi Nanya
Contact person
Takashi Nanya
(nanya@cs.titech.ac.jp)
Abstract
We propose a synthesis method of quasi-delay-insensitive processors
that compute and transfer data in the two-phase manner.
Specifications are described in Dependency Graphs, which represent
conditional branches and loops as well as dependency relations between
micro-operations. This synthesis method implements controllers
directly from Dependency Graphs by replacing every graph node with the
corresponding circuit block. The use of the proposed Auto-Sweeping
Module overcomes the problem of two-phase control, where idle-phases
consumed almost half of total execution time.
TR94-0023
Constraint problems on attributed trees and their applications to
knowledge maintenance
Takehiro Tokuda, Yoshimichi Watanabe, Nobuo Kakinuma and Kazuto Tominaga
Contact person
Yoshimichi Watanabe
(nabe@cs.titech.ac.jp)
Abstract
We first show that a certain type of knowledge base update problems
can be regarded as constraint problems. We then consider constraint
problems on attributed trees which can be solved by local
propagation of values. Finally we show four new constraint solving
methods: DeltaUp method, tree walk method, set manipulation method
and attribute grammar method. These methods allow us to have an
incremental solution of constraint problems.
TR94-0024
A transformation method of attribute grammars into efficient action
routines using attribute value estimate
Yoshimichi Watanabe and Takehiro Tokuda
Contact person
Yoshimichi Watanabe
(nabe@cs.titech.ac.jp)
Abstract
This paper presents a transformation method of attribute grammars
into efficient action routines by using inherited attribute
estimate. An attribute evaluator based on one-pass bottom-up parser
cannot generally evaluate inherited attributes of a given attribute
grammar during parsing. To solve the problem, we introduce global
variables for a special type of inherited attributes, which we call
constant-propagation type inherited attributes. The values of the
inherited attributes are determinate by constant assignments or copy
functions from the same inherited attributes or synthesized
attributes. Our transformation can be applied to the subclass of
L-attributed grammars. Using our transformation, we can obtain
efficient action routines from attribute grammars of synthesized
attributes and constant-propagation type inherited attributes.
TR94-0025
Treating inherited attributes of
L-attributed grammars as synthesized attributes
Takehiro Tokuda
Contact person
Yoshimichi Watanabe
(nabe@cs.titech.ac.jp)
Abstract
We present two methods of treating a subclass of L-attributed
grammars as synthesized attribute grammars. The first method
performs one-pass bottom-up parsing of a given input defined by an
L-attributed grammar and evaluates all inherited and synthesized
attributes by manipulation of finitely-manipulatable attributed item
sets. The second method transforms an L-attributed grammar into an
equivalent synthesized attribute grammar by manipulation of
one-dimensional vector-valued synthesized attributes.
TR94-0026
Eliminating unnecessary items from the one-pass evaluation of attribute
grammars
Yoshimichi Watanabe and Takehiro Tokuda
Contact person
Yoshimichi Watanabe
(nabe@cs.titech.ac.jp)
Abstract
We present two efficient attribute evaluator construction methods
for a wide subclass of L-attributed grammars by enumeration of
attributed items during one-pass left-to-right parsing. We have
already proposed a construction method of a parser/evaluator for the
subclass of L-attributed grammar. However, the evaluator produced
by our previous method uses a great number of attributed items to
evaluate all attributes of a given input string. In this paper we
propose two generalized methods to eliminate unnecessary attributed
items enumerated in our previous method. Our methods allow us to
evaluate all attributes taking advantage of the use of available
lookahead information.
TR94-0027
Reducing the number of items from LR-attributed grammar evaluators
Takehiro Tokuda
Contact person
Yoshimichi Watanabe
(nabe@cs.titech.ac.jp)
Abstract
We have presented methods to reduce the number of attributed LR
items from one-pass LR-attributed grammar evaluators using top-down
and bottom-up compatibility test of lookahead information.
TR94-0028
GENESYS: An integrated environment for developing Systemic Functional
Grammars
Tadashi Kumano, Takenobu Tokunaga, Kentaro Inui and Hozumi Tanaka
Contact person
Takenobu Tokunaga
(take@cs.titech.ac.jp)
Abstract
In order to develop rich language resources systematically and
efficiently, we need not only well-founded linguistic theories but
also software tools that facilitate writing and examining them. This
paper reports on the GENESYS system, which provides an integrated
environment for developing Systemic Functional Grammars (SFGs).
GENESYS has a special editor for writing SFGs. Further, the user can
examine grammars by running the GENESYS' surface generator. The
information of the intermediate states of the generation process can
be monitored through the graphical user interface.
TR94-0029
A software generator system based on timed attribute hypergraph grammars
Mikifumi Shikida, Yasuhide Yamamoto and Takehiro Tokuda
Contact person
Mikifumi Shikida
(shikida@cs.titech.ac.jp)
Abstract
We propose Timed Attribute Hypergraph Grammars (TAHG) and a software
generator system based on the grammar. A TAHG model consists of a
syntax definition, a semantic definition, and object definitions.
In a syntax definition, a designer describes the syntax definition
using a parameterized context-free grammar, which we call a
hypergraph grammar. In a semantic definition, the designer
describes dynamic behavior of target objects using a generalization
of attribute grammars. In object definitions, the designer
describes views of objects and interactions with a user.
Our TAHG-model based software generator allows us to produce an
interactive software system, which can deal with graph-structure
based two-dimensional information processing systems such as Petri
net editors and logic circuit editors.
TR94-0030
A super LR($0$) parsing of LR($k$) grammars
Takehiro Tokuda
Contact person
Yoshimichi Watanabe
(nabe@cs.titech.ac.jp)
Abstract
We present efficient methods for LR($0$) item-based parsing of LR($k$)
grammars using bottom-up computation of lookahead information. We give four
methods to four cases of LR($k$) grammars for $k \geq 1$. The first case is an
$\varepsilon$-free LR($1$) grammar with uniform reductions. The second case is
a general $\varepsilon$-free LR($1$) grammar. The third case is a general
LR$(1$) grammar. The fourth case is a general LR($k$) grammar for $k \geq
2$. Our method has a feature that, when we increase the lookahead length, the
cardinality of the set of LR($0$) items decreases monotonically.
TR94-0031
A parsing method for context-free languages using bottom-up lookahead
computation
Yoshimichi Watanabe and Takehiro Tokuda
Contact person
Yoshimichi Watanabe
(nabe@cs.titech.ac.jp)
Abstract
We present a new parsing algorithm of context-free languages using bottom-up
computation of lookahead information. Earley's algorithm uses a great number
of items during enumeration to recognize a context-free language. If we use
items with lookahead fields, then we can reduce the number of items, but we
inevitably increase the space of items considerably. Our method may reduce the
number of items during enumeration without using lookahead fields of
items. Our computation of lookahead information takes place in a bottom-up
manner. Also we do not have to compute lookahead information until it is
necessary during enumeration.
TR94-0032
A realization method of optimally generalizing neural network based on error minimization
Hidemitsu Ogawa and Jun-ichi Funada
Contact person
Yukihiko Yamashita
(yamasita@cs.titech.ac.jp)
Abstract
The task required of a Multilayer Perceptron can be summarised as an
optimal approximation of the original function from the set of sampled data,
incorporating the concept of generalization ability.
Based on previous research, it has been shown that even if the original
function is not known, it's best approximation can be computed using the
technique namely Projection Generalizing Neural Network(PGNN).
However, using the above technique, we obtain a family of NNs realizing the
same goal. In this paper, an effort has been made to exploit this existing
degree of freedom. While maintaining the optimal generalizing abilty of the
net , a method of generating a NN which is least susceptible to errors
in the connection weights is theorized.
TR94-0033
A unified theory of the family of projection filters for signal and image estimation
Yuji Koide, Yukihiko Yamashita and Hidemitsu Ogawa
Contact person
Yukihiko Yamashita
(yamasita@cs.titech.ac.jp)
Abstract
The general forms of the projection filter, the partial projection filter,
and the average projection filter for image estimation have been
obtained by solving their respective systems of operator equations.
Ogawa proposed a unified system which could provide the three forms
simultaneously.
But the obtained form became complicated compared with the original forms.
In order to solve this problem,
we propose another unified system which can give almost the same forms as
originals.
Furthermore, we provide the properties which are common to
the family of the projection filters together.
TR94-0034
Recent topics on coding theory for computers
Eiji Fujiwara
Contact person
Eiji Fujiwara
(fujiwara@cs.titech.ac.jp)
Abstract
Coding theory is actively applied to computer systems, communication systems
and AV systems in order to improve their reliability. Among them, it has been
most successfully applied to computer systems, especially to memory systems
such as semiconductor memories and file memories. This paper mentions about
present state of an application of coding theory to computer systems, and
introduces some codes recently developed which have attracted the attention.
Furthermore, it discusses on the subject for a future study.
TR94-0035
Towards average-case complexity analysis of
NP optimization problems
Rainer Schuler and Osamu Watanabe
Contact person
Osamu Watanabe
(watanabe@cs.titech.ac.jp)
Abstract
For the worst-case complexity measure,
if P $=$ NP,
then P $=$ OptP,
i.e.,
all NP optimization problems are polynomial-time solvable.
On the other hand,
it is not clear whether a similar relation holds
when considering average-case complexity.
We investigate
the relationship between
the complexity of NP decision problems and
that of NP optimization problems
under polynomial-time computable distributions,
and study what makes them (seemingly) different.
It is shown that
the difference between
P$^{\rm NP}_{\rm tt}$-samplable
and P$^{\rm NP}$-samplable distributions is crucial.
TR94-0036
Probabilistic analysis of local search
for constraint satisfaction and its application to rescheduling
Hoong Chuin Lau
Contact person
Hoong Chuin Lau
(hclau@cs.titech.ac.jp)
Abstract
The Constraint Satisfaction Problem (CSP) is known to be NP-hard in general.
We present two new theoretical and experimental results for solving random CSP
instances based on local search (iterative repair).
First, we consider a relaxed version of the CSP:
suppose we have an initial assignment that is reasonably close to a solution,
can we find a solution in polynomial time? This consideration has practical
implications in the rescheduling of resources especially on a factory floor.
We propose a simple local search algorithm
and show probabilistically that, for instances that
have unique solutions, our local search procedure
returns a solution almost surely when
the Hamming distance between the initial assignment and the solution
is less than $n/2$, where $n$ is the number of variables.
Next, we show that iterative application of local search will yield
an assignment which satisfies significantly more constraints than
a random assignment. This observation leads us to give provable bounds
on the probability of constraint consistency for which
iterative local search will almost surely yield a solution in polynomial time.
These results are consistent with experiments.