Big Data
What is it and why does it matter?Advanced methods to extract value from large and complex data sets
BigData@BTH
Which areas are we focusing on?Big data analytics for image processing
Core technologies
Foundations and enabling technologies
Read more about what we do in our subprojects
News
Final Workshop
BigData@BTH final workshop Date: Wednesday, January 27, 2021. This workshop will be a virtual, full-day workshop with research highlights, company presentations, and a project retrospective. Information will be sent out later to our partners and [...]
External evaluation of BigData@BTH
External Evaluation made by Damvad Analytics BigData@BTH has had a major international scientific impact and has given companies increased knowledge reveals a recent external evaluation. The evaluation, carried out by the external analysis company [...]
Shahrooz Abghari will defend his doctoral thesis December 1, 2020
Shahrooz Abghari will defend his doctoral thesis December 1, 2020 The seminar will be at BTH and Zoom. Time: December 1, 2020, at 13:00 Place: J1630, Campus Gräsvik, Karlskrona Thesis title: Data Mining Approaches for Outlier Detection Analysis [...]
Partner testimonials
Ericsson has worked together with BTH in several successful projects in the past. Since the subject of bigdata is of great importance to us it was obvious that we should expand our ongoing collaboration with BTH to include this profile.
Together with the expertise of BTH we will test novel ideas, as well as develop prototype image processing and image analysis software. Our goal is to keep our position as the largest provider of digital Swedish Church records and other historical records online and to provide the most powerful yet easiest to use software to navigate all the documents.
We have good experiences from previous projects regarding research collaboration. It has led to product improvement and valuable knowledge to the company. We expect that the close collaboration will continue with BTH-researchers and our developers actively working together and sharing information as well as results.
Together with BTH we are working on different image processing and analysis algorithms. The work has already been very fruitful, with both joint publications and patent applications.
“Data are becoming the new raw material of business.” – Craig Mundie, Senior Advisor to the CEO at Microsoft.
Recent publications
Abstract
Mathematical morphology has been of a great significance to several scientific fields. Dilation, as one of the fundamental operations, has been very much reliant on the common methods based on the set theory and on using specific shaped structuring elements to morph binary blobs. We hypothesised that by performing morphological dilation while exploiting geometry relationship between dot patterns, one can gain some advantages. The Delaunay triangulation was our choice to examine the feasibility of such hypothesis due to its favourable geometric properties. We compared our proposed algorithm to existing methods and it becomes apparent that Delaunay based dilation has the potential to emerge as a powerful tool in preserving objects structure and elucidating the influence of noise. Additionally, defining a structuring element is no longer needed in the proposed method and the dilation is adaptive to the topology of the dot patterns. We assessed the property of object structure preservation by using common measurement metrics. We also demonstrated such property through handwritten digit classification using HOG descriptors extracted from dilated images of different approaches and trained using Support Vector Machines. The confusion matrix shows that our algorithm has the best accuracy estimate in 80% of the cases. In both experiments, our approach shows a consistent improved performance over other methods which advocates for the suitability of the proposed method.
Read more at http://www.sciencedirect.com/science/article/pii/S016786551630335X
In this paper, we propose a novel technique for unsupervised text binarization in handwritten historical documents using kmeans clustering. In the text binarization problem, there are many challenges such as noise, faint characters and bleedthrough and it is necessary to overcome these tasks to increase the correct detection rate. To overcome these problems, preprocessing strategy is first used to enhance the contrast to improve faint characters and Gaussian Mixture Model (GMM) is used to ignore the noise and other artifacts in the handwritten historical documents. After that, the enhanced image is normalized which will be used in the postprocessing part of the proposed method. The handwritten binarization image is achieved by partitioning the normalized pixel values of the handwritten image into two clusters using k-means clustering with k = 2 and then assigning each normalized pixel to the one of the two clusters by using the minimum Euclidean distance between the normalized pixels intensity and mean normalized pixel value of the clusters. Experimental results verify the effectiveness of the proposed approach.
Read more at https://a.bth.se/bigdata/files/2016/11/Unsupervised-text-binarization-in-handwritten-historical-documents.pdf