My research area is statistical machine learning for sequence modeling. In particular, I was interrested in subclasses of multiplicity automata (also called weighted finite state machines or weighted finite state automata):
Recently, I'm working on generalisations of multiplicity automata:
I'm also working on using multiplicity automata for image, sound and temporal sequence recognition. In particular, I work with edit distances in order to provide a good mesure between structured representations.
I developped a software platform for learn edit distances betweens trees or between sequences. This platform SEDiL was created in JAVA (1.5) using Swing for the user interface. This work was done during my post-PhD with Marc Sebban for the Marmota Project.
I developped an inference algorithm of multiplicity automata from sequence. This program called DEES is implemented in C++. We proved with FranÃ§ois Denis and Amaury Habrard this algorithm has many interresting theoretical properies (see [COLT'06] or [CAp'06] for more details) :
These positives learning results are surprising since the identified class is not recursively enumerable!
In practice, the algorithm has a good behaviour as expected by theoretical results (see [ICGI'06]).
Elaboration of the algorithm DEES which identify in the limit with probability 1 the class of stochastic rational series.
When DEES is constrained to generate PA it returns only PRA. The class of probabilistic distribution generated by PRA is identified in the limit with probability 1.
HMM and PA are identifiable in the limit with probability 1, but the proof use an algorithm unusable for real uses.
In Probabilistic Automata written by Azaria Paz in 1977, some problems was left open:
The answer is negative for these three questions.
Introduction of a new intermediary class between PA and PDA ; Probabilistic Residual Automata (PRA). This class has many interresting properties.
Each automata class (MA, PA, PRA and PDA) generates different probabilistic distributions class.
The class of MA generating probabilistic distribution is not recursively enumerable.
Studies of reduction and saturation operator for each class ; MA, PA, PRA and PDA.
HMM (Hidden Markov Model) are equivalent to PA.
This work is particulary well suited for domain where data are sequences. In particular: