This web site is designed for accessibility. Content is obtainable and functional to any browser or Internet device. This page's full visual experience is available in a graphical browser that supports web standards. See reasons to upgrade your browser.

CRBLP

Homepage

Research

Publications

Download

People

Internship

Student Projects

Events

Links

Contact Information

Center for Research on Bangla Language Processing
BRAC University
66, Mohakhali, Dhaka-1212
Phone: 880-2-8824051 Ext:4023
Fax: 880-2-8810383
crblp-staff@student.bu.ac.bd

::--ACTIVE RESEARCH PROJECTS--::

::DOCUMENT IMAGE PROCESSING::

Optical Character Recognition:
Optical Character Recognition (OCR) is a process to convert printed text Image to editable text that can be used and reformatted for other uses. We are developing a Bangla Character Recognizer that can recognize printed Bangla document and convert to editable text.
Project Page: http://www.bracuniversity.net/research/crblp/ocr

 

::SPEECH PROCESSING::

Imagine a day when you wake up, get washed and order your Personal Assistant (who most unfortunately is a machine) to make coffee and to read out the headlines from the newspaper. Now, there are two basic hurdles we need to overcome. First, how do we make a computer understand what we are saying. And second, how do we make the computer say the things we want it to say. To solve the first problem we need speech recognition technology and speech synthesis for the second.

Speech Recognition:
Speech recognition is the process of converting a speech sound to a sequence of words, by means of an algorithm implemented as a computer program. We are currently working on Bangla speech recognition using Hidden Markov Model (HMM) as the technique and HTK as the toolkit. The performance of Isolated and Continuous Speech Recognition is tested for Limited vocabulary list using the HTK toolkit. Currently we are concentrating on the preprocessing issues and also looking for other tools for speech recognition.
Project Page: http://www.bracuniversity.net/research/crblp/speech

Speech Synthesis:
Speech Synthesis is the artificial production of human speech. A Text-to-speech system converts normal text to speech. We are working on generating speech signal from Bangla Text.
Project page: http://www.bracuniversity.net/research/crblp/tts

 

::DOCUMENT AUTHORING::

TTF to Unicode Font Converter:
There are number of Ascii based Bangla fonts out there. The trouble with these fonts is that if the host machine doesn’t have the font installed, then the text gets jumbled up. We are working on a TTF to Unicode Font converter which will enable us to convert the Ascii encoded text to a Unicode enabled text. That way, we just need to have a Unicode Bangla font installed and we will be able to see the text properly.
Project Page: http://www.bracuniversity.net/research/crblp/converter

 

::COMPUTATIONAL LINGUSTICS::

Corpus Analysis & Corpus Collection:
We have developed a tool for extensive corpus analysis on word frequency distribution. We currently have a corpus of one-year of Prothom-Alo newspaper text and Charjapad and Baru Chandi Das Er Kabbo. We have analyzed our corpus for regularities and anomalies in Bangla Word Usage.
Project Page: http://www.bracuniversity.net/research/crblp/corpus

Localized URL:
As world researchers thinking and developing standard on localized domain name so why don’t we think about it? CRBLP team is working this project to develop a standard for domain name in Bangla.
Project Page: http://www.bracuniversity.net/research/crblp/localURL

Lexicon:
We need a rich and informative lexicon for any kind Bangla Language Processing. We have developed a wordlist of 160 thousand words with 1st step parts of speech tagging.
Project Page: http://www.bracuniversity.net/research/crblp/lexicon

Wordnet:
We need a rich and informative lexicon for any kind Bangla Language Processing. We have developed a wordlist of 160 thousand words with 1st step parts of speech tagging.
Project Page: http://www.bracuniversity.net/research/crblp/wordnet

Spell Checker:
Bangla phonetic spelling checker. Gives suggestion for misspelling words based on similarities in pronunciation. Implemented based on Double Metaphone phonetic encoding.
Project Page: http://www.bracuniversity.net/research/crblp/speller

Parallel Corpus:
The aim of this project is to make the multi-lingual Bangla and English parallel corpus in electronic and aligned form. The parallel corpus can be used for translation memory, cross-language information retrieval, multi-lingual lexicography, terminology extraction, language study and language learning tool.