Welcome to CBW 2022

3rd International Conference on Cloud, Big Data and Web Services (CBW 2022)

March 19 ~ 20, 2022, Vienna, Austria



Accepted Papers
Explainable Vector Representation with Stylometrix: An Approach to Identifying Stylometric Signature of a Text Class by Morpho-Syntactic Features Selection

Inez Okulska1, Anna Zawadzka2, Michal Szczyszek3, 1NASK National Research Institute, Warsaw, Poland, 2Warsaw University of Technology, Warsaw, Poland, 3University of Adam Mickiewicz, Poznan, Poland

ABSTRACT

This paper introduces a stylometric tool called StyloMetrix that produces a fully interpretable vector representation of a document written in Polish, where every column contains a value calculated by expertcurated linguistic metrics. StyloMetrix investigates morpho-syntactic features and the lexical variety as well as chosen psychological aspects of used content words. It allows for classifying text types based solely on their writing style without analyzing their semantics as in traditional text classification. The other benefit of the approach is feature selection, that performed on such a interpretable stylometric classification task, has a potential of identifying a stylometric signature of the entire class o f considered text type. The application of the tool has been presented on a case study of classifyingwritten young adult pornography.

KEYWORDS

Stylometry, Text classification, Morpho-syntactic analysis, Sociolect, Automated Text Analysis.


An Explosive Learning on Bangla Text Detection of Mixing Sanskrit Words with Slang Expressions

Rozanee Kanta Das, Tanjina Zaman Rinvee, Alaya Refat Tinni, Sharun Akter Khushbu and Md. Sadekur Rahman, Department of Computer Science and Engineering, Daffodil International University, Dhaka, Bangladesh

ABSTRACT

Our life is surrounded with technology and we can’t live without this technology. Technology is upgrade day by day. Using Natural Language Processing (NLP) techniques computer can understand human language. Now a days, by the help of NLP researcher are interested to work with text document classification. Bangla text document classification, sentiment analysis etc. are interested topic for researcher. So, in our work we are going classify mixing Sanskrit words with slang expressions of Bangla sentences. In our Bangla language peoples are familiar with Sadhu (Saint) and Colito (Common) form. Colito (Common) form is uses in our daily life and Sadhu form is used to written Bangla literature, novel, poems etc. When two forms of Bangla language mixed up in a sentence this is called Guruchandali Dosh (mixing Sanskrit words with slang expressions). We our work we are going to detect the mixing Sanskrit words with slang expressions sentences using supervised learning techniques. In NLP work text document are easy to preprocess and translate. So, we collect Sadhu (Saint) and Colito (Common) form of data from various Bangla text book, novel, poems and newspaper. Then we make our dataset changing the sentences using some Bangla grammatical rules. Finally, we are able to collects 1712 Bangla text data. We need to preprocess our data before using the machine learning algorithms. We preprocessed our text raw data by removing unwanted data, Stop Words etc. After that we use six classification techniques to classify Guruchandali Dosh sentences. In our work we use DT, RF, NB, XGB, SVM, KNN algorithms. All algorithms perform very well on our datasets. Among them Multinomial Naïve Bayes (MNB) algorithm came with highest accuracy which is 85%. When we give input Bangla text data in our model, MNB model is able to predict the mixing Sanskrit words with slang expressions sentences perfectly.

KEYWORDS

Natural Language Processing. Multinomial Naïve Bayes, Stop words.


The Platform for Digitization of Georgian Documents

Erekle Magradze1, Davit Soselia2, Levan Shughliashvili2, Irakli Koberidze2, 1Ilia State University, Tbilisi, Georgia, 2Participant of Shota Rustaveli National Science Foundation Grant CARYS-19-1287

ABSTRACT

Since the beginning of active publishing activity in Georgia, voluminous printed material has been accumulated, the digitization of which is an important task. Digitized materials will be available to the audience, and it will be possible to find text in them and conduct various factual research. Digitizing scanned documents means scanning documents, extracting text from the scanned documents, and processing the text into a corresponding language model to detect inaccuracies and grammatical errors.Implementing these stages requires a unified, scalable and automated platform, where the digital service developed for each stage will perform the task assigned to it, at the same time, it will be possible to develop these services dynamically, so that there is no interruption in the work of the platform.

KEYWORDS

NLP, OCR, BERT, Kubernetes, Transformers.


Pre-training of Masked Language Model in Nepali Corpus

Shushanta Pudasaini, Aakash Tamang, Sagar Lamichhane, Sujan Adhikari, Sajjan Adhikari and Sunil Thapa, InfoDevelopers Pvt. Ltd. Sanepa, Lalitpur, Nepal

ABSTRACT

Recent improvement on the Transformer model has provided us with state-of-the-art architectures like BERT, RoBERTA, and GPT-3. Though we have been obtaining better results on the real-world NLP problems with the use of these architectures, there are still lots to achieve in the Nepali language. Nepali Language, which uses the Devanagari script, has rich semantics and grammatical structure. But due to the lack of computational resources, the optimum results using the latest architectures is yet to be achieved.Which is why, there is not any NLP Nepali model publicly available which can be used by the other researchers. Through this paper, we intend to fill the gap by providing word embeddings for the Nepalilanguage trained on Word2Vec, Doc2Vec and BERT architecture and which can be used as a base for creating benchmark results on different NLP tasks.

KEYWORDS

Natural Language Processing, Word Embedding, BERT, Masked Language Model & Nepali Corpus.


Building English-Indian Languages Machine Translation Systems using the State-of-the-art Transformer Architecture

Akshara Kandimalla1, Kumar Souvik Maji1, Pintu Lohar2, and Andy Way2, 1School of Computing, Dublin City University, Ireland, 2ADAPT Centre, School of Computing, Dublin City University, Ireland

ABSTRACT

Most of the Indian languages lack su?icient parallel corpus for Machine Translation (MT). In this study, we build English-to-Indian language Neural Machine Translation (NMT) systems using the state-of-the-art transformer architecture. In addition, we investigate the utility of back-translation and its effect on the system performance. Furthermore, we evaluate the translation outputs by using both the automatic and human evaluation measures. It is revealed from our experimental evaluation that the state-of-the-art NMT architecture helps produce good quality and fluent translation outputs that retain almost all the information of the source-language text during translation in most cases.

KEYWORDS

Machine Translation, Transformer, Byte-pair encoding, Back-translation.


State Drift and Gait Plan in Feedback Linearization Control of a Tilt Vehicle

Zhe Shen and Takeshi Tsuchiya, Department of Aeronautics and Astronautics, The University of Tokyo, Tokyo, Japan

ABSTRACT

To stabilize a conventional quadrotor, simplified equivalent vehicles (e.g., autonomous car) are developed to test the designed controller. Based on that, various controllers based on feedback linearization have been developed. With the recently developed concept of tilt-rotor, there lacks the simplified/equivalent model, however. Indeed, the tilt structure is relatively unusual in vehicles. In this research, we put forward a unique fictional vehicle with tilt structure, which is to help evaluate the property of the tilt-structure-aimed controllers. One phenomenon (state drift) in controlling an over-actuated tilt structure by feedback linearization is presented subsequently. State drift can be easily neglected and is not paid attention to in the current researches in tilt-rotor controllers’ design so far. We report this phenomenon and provide a potential approach to avoid this behavior.

KEYWORDS

Feedback Linearization, State Drift, Over-actuated System, Gait Plan, Stability.


Covid 19 Navigator Taxi Application for Urban Mobility During Pandemic Period

Bimsara Kanchana, Damitha Perea, Rojith Peiris and Jagath Wickramarathne, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka

ABSTRACT

This research paper focuses on increasing the awareness between taxi users and can help to protect themselves in COVID-19 and take precautions. The global pandemic of Covid-19 is spreading at an astonishing rate. COVID-19 pandemic causes a negative impact for economic, social, cultural factors. Government agencies are warning people to reduce the transportation and to maintain social distance. Stopping the spread of COVID-19 requires identifying persons who are susceptible to infection and need to trace the COVID-19 positive first contacts. Most people are reduced to using public transportations and taxi services due to unidentifiable health conditions in earlier users. The proposed approach can be used to track the taxi drivers and their passengers previous COVID-19 status as well as navigate the safest route by showing the COVID-19 contamination areas. By using this approach users can be aware of earlier users of the taxi service as well as taxi driver or passenger COVID-19 status before taking the trip as well as if they get touch with any COVID-19 patients, can take immediate precautions. This application helps to increase the usage of taxis by making users trust and confidence against COVID-19 infection.

KEYWORDS

COVID-19, Navigation, GPS, Google Map.


Piano4Play: An Automated Piano Transcription and Keyboard Visualization System using AI and Deep Learning Techniques

Jinge Liu1, Shuyu Wang2, 1Portola High school, 1001 cadence, Irvine, CA 92618, 2University of Minnesota, 2900 University Ave, Crookston, MN 5

ABSTRACT

Piano keyboard visualization was very popular right now, but there are very few virtual piano keyboard visualizations right now [1]. I was using unity to show the virtual piano keyboard and then they can play piano pieces by themselves or play a recording online [2]. After that you can listen and see how the recording pieces play it on the visual keyboard to give them a clear idea about how the songs played on a keyboard2 [3]. For those who played by themselves it can let them heard and know also when the visual piano play for them, they can tell if they have of beat playing or they missing not. Piano4Play is an automated piano transcription and keyboard visualization system using AI and deep learning techniques. The user could upload a recorded piece of music, and our app would visualize the music on a digital piano keyboard. The user could see how the music is played visually in order to help piano beginners to see how the music will be played on piano in order to help them learn more quickly and easier, and advanced players could use the app to see whether they made any mistake when they are playing so they can get some improvement. Our app uses wav and MIDI files, repl, real-time database,google Collab and Unity.

KEYWORDS

3D Modeling, Machine Learning, Data Science.


The application of techniques derived from artificial intelligence to the prediction of the solvency of bank customers: case of the application of the CART type Decision Tree (DT)

Karim Amzile, (PhD, Student), Rajaa Amzile, (PhD, Professor), Faculty of Law, Economics and Social Sciences Agdal, Mohammed V University of Rabat, Morocco

ABSTRACT

In this study we applied the CART-type Decision Tree (DT-CART) method derived from artificial intelligence technique to the prediction of the solvency of bank customers, for this we used historical data of bank customers. However we have adopted the process of Data Mining techniques, for this purpose we started with a data preprocessing in which we clean the data and we deleted all rows with outliers or missing values as well as rows with empty columns, then we fixed the variable to be explained (dependent or Target) and we also thought to eliminate all explanatory (independent) variables that are not significant using univariate analysis as well as the correlation matrix, then we applied our CART decision tree method using the SPSS tool. After completing our process of building our model (AD-CART), we started the process of evaluating and testing the performance of our model, by which we found that the accuracy and precision of our model is 71%, so we calculated the error ratios, and we found that the error rate equal to 29%, this allowed us to conclude that our model at a fairly good level in terms of precision, predictability and very precisely in predicting the solvency of our banking customers.

KEYWORDS

Data Mining, Credit Risk, DT, AI, Bank.


BTF Prediction Model using Unsupervised Learning

Soichiro Kimura1, Kensuke Tobitani2, Noriko Nagata1, 1Kwansei Gakuin University, Hyogo, Japan, 2University of Nagasaki, Nagasaki, Japan

ABSTRACT

In this study, we propose a BTF prediction method using DNN as a first attempt to generate textures based on sensory texture recognition. First, we measure the target material’s BTF. Next, we create our own series data set based on the acquired BTFs. Finally, we build a BTF prediction model using PredNet, which is a DNN that can input time series data. The texture image is generated by optimizing the difference between the predicted image and the input image to be minimized. We obtained high prediction accuracy, confirming this method’s effectiveness.

KEYWORDS

PredNet, Machine Learning, BTF, Affective texture.


A Bengali Word Identification and Verification using Machine Learning Approach

Mohammad Abu Nadif1, Md. Shakibul Hasan2, Masud Rana2 and Nasim Bin Rahman2, 1Faculty of Science and Technology, American International University-Bangladesh, Dhaka, Bangladesh, 2Department of CSE, Daffodil International University, Dhaka, Bangladesh

ABSTRACT

The research project is mainly based on image classification. We also showed a path to Optical Character Recognition. If we are to define Optical Character Recognition is basically a technology which can change over a computerized image of content to editable content i.e. it can change digital image to editable text. An OCR by using image processing and classification for Bangla dialect is proposed here. The primary step of this project is that to recognize and detect text, the usage of image classification and Convolutional Neural Network (CNN) is implemented. Characters are separated from the Corpus and then the most decisive and challenging of the project step is implemented which is the verification of the word or corpus. The Bengali word or corpus is then extracted from the digital image and then it is converted to digital font and then the digital font is compared with our accumulated data set to verify whether the word is absolute or not. To enrich the diversity of Bengali language in the field of technology the project plays a very significant role from its own view.

KEYWORDS

Image classification, Machine learning, Artificial intelligence, Neural Network.


Ethereum based Smart Contracts for Trade and Finance

Rishabh Garg, Department of Electrical & Electronics Engineering, Birla Institute of Technology & Science, K.K. Birla Goa Campus, Sancoale, Goa – 403726, India

ABSTRACT

Traditionally, business parties build trust with a centralized operating mechanism, such as payment by letter of credit. However, the increase in cyber-attacks and malicious hacking, has jeopardized business operations and finance practices. Emerging markets, owing to their higher banking risks and bigger presence of digital financing, are looking forward to technology driven solutions, financial inclusion and innovative working paradigms. Blockchain has the potential to enhance transaction transparency and supply chain traceability. It has captured a vast landscape with 200 million crypto users worldwide. Fintech and blockchain products are popping up across brokerage, digital wallets, exchanges, post-trade clearance, settlement, middleware, infrastructure, and base protocols.

KEYWORDS

Authentication, Blockchain, Channel, Cryptography, DApps, Data Portability, Decentralized Public Key Infrastructure (DPKI), Ethereum, Hash function, Hashgraph, Privilege creep, Proof of Work algorithm, Revocation, Storage Variables, Zero Knowledge Proof.


A Secret Data Sharing Method based on IPFS and Umbral

jianbin li, Jianzhong Zhang, Song Liu, Institute of Systems and Networks, College of Computer Science, Nankai University, China

ABSTRACT

The rapid development and broad application of cloud computing have brought convenience to data storage and sharing, but data privacy and availability cannot be guaranteed. Therefore, designing a method to provide distributed storage and encryption services has become the key to solving the above problems. In this paper, we propose a method of secret data sharing based on IPFS and Umbral, where IPFS provides distributed storage services and Umbral provides distributed proxy re-encryption services. In particular, to balance the reencryption calculation burden of each node, we divide the node state by setting a load threshold so that the re-encryption task is transferred from a relatively high-load node to a relatively low-load node. The node load threshold tends to be consistent through periodic load threshold updates to avoid setting too low load thresholds on some nodes to evade task assignments. The simulation results show that the proposed method realizes the sharing of secret data and guarantees the overall load balance. 61.2% reduces the population standard deviation of the number of node re-encryption tasks compared with the static load threshold case.

KEYWORDS

Distributed Storage, Proxy Re-encryption, Load Balancing.


Supervised learning-based Indoor Positioning System Using WiFi Fingerprints

Basem Suleiman, Ali Anaissi, Yuhuan Xiao, Waheeb Yaqub, Abdallah Lakhdari and Widad Alyassine, School of Computer Science, The University of Sydney, Australia

ABSTRACT

We propose to leverage the WiFi fingerprint of people in confined areas to monitor and manage the mobility of the crowd in a smart city. We transform the indoor positioning problem into a supervised learning problem that takes as an input the WiFi fingerprint of a person and predicts their availability within a confined area.We investigate the accuracy and the granularity of multiple supervised learning methods in the WiFi fingerprint-based indoor positioning. Preliminary experiments show promising results for different granularity levels. 99.88% of balanced accuracy is achieved to predict the availability of a person at the building level, and 88.56% to 93.44% of accuracy is achieved to predict the availability of a person at the floor level.

KEYWORDS

Indoor Positioning, Machine Learning, WiFi Fingerprint, Data Analysis.


Post-Training for Aspect-based Sentiment Analysis in Indonesian Language

I Putu Eka Surya Aditya1 and Masayu Leylia Khodra2, 1School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia, 2School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

ABSTRACT

Aspect-based sentiment analysis has a big role in business development because it makes it easier for business people to evaluate feedback from customers for every aspect of service. Aspect-based sentiment analysis is generally divided into two subtasks, namely aspect extraction/categorization which aims to extract aspect categorization into pre-determined categories and sentiment classification which aims to determine the sentiments of each aspect. In recent years, transformer models like ELMo, BERT, XLM-R and XLNet have achieved great success in natural language processing (NLP) tasks. In this work, we use three transformer models: BERT, XLM-R, and XLNet as well as compare the performance of the three models on sentiment classification task. We then explore post-training that was proposed by (Xu et al., 2020) [1] on BERT and XLM-R and compare the performance of the models with using post-training and without using post-training. Experimental results showed that post-training on XLM-R achieved better performance than post-training on BERT. Our models achieve new state-of-the-art on Indonesian hotel review dataset (HoASA).

KEYWORDS

Aspect-based sentiment analysis. Post-training, Transformer Models, NLP.


WebReview: An Intelligent Classification Platform to Automate the Evaluation and Ranking of Website Quality and Usability using Artificial Intelligence and Web Scraping Techniques

Darren Xu1, Dexter Xu2, Ang Li3, 1Redlands High School, 840 E Citrus Ave, Redlands, CA 92374, 2Redlands High School, 840 E Citrus Ave, Redlands, CA 92374, 3California State University, Long Beach, 1250 Bellflower Blvd, Long Beach, CA 90840

ABSTRACT

Paywalls are a staple of the internet and seen in a vast amount of websites [1]. Encountering a paywall is alwaysannoying, whether you’re doing work for school or just trying to catch up on the latest news [2]. To eliminate thisannoyance we have created Wall Breaker, a google extension with the primary task of bypassing any paywall usinga variety of methods [3]. Our extension uses methods such as opening the website in an incognito tab or acting as anew user when clicking on a link. Although not the first of its kind,our extension is truly unique in the methods andtechniques used. The popup used is easy to use and simple to look at, providing the best user experience. WallBreaker will work on most websites, both popular and lesser known ones. It makes no distinction between certaintypes of websites and the methods can be used on any page. While Wall Breaker might not work on every websitethose are few and far between.

KEYWORDS

Scraping techniques, Google, paywall.


Imitation Learning based Self Driving Car with Sensor Fusion

Sukkrit Sharma, Bidisha Mukherjea and Dr.C. Malathy, Department of Computer Science and Engineering, SRM Institute of Science and Technology, Kattankulathur, India

ABSTRACT

Driving based on sight is difficult to implement because of the lack of historic data. An autonomous model needs to learn from its environment in order to know how to act. We simplify this problem by first, training a supervised agent that learns based on ground-level knowledge; also known as the expert as it has a birds eye view of the world. The information learnt by the expert agent is provided to the semi supervised agent in the second stage and acts as the supervision teacher model. The semi-supervised agent is a perception-based agent which does not observe the ground truth but makes decisions from vision and it learns by imitating the supervised agent. All the experiments and training were performed in a simulated environment called the CARLA simulator. The final testing was done on the NoCrash benchmark of the CARLA Simulator and proves to achieve substantially good results.

KEYWORDS

Imitation Learning, Semi-Supervised Agent, CARLA simulator, Self-Driving Car, Supervised Agent .


Thread Defect Detection based on a Line Enhancement Method

Qiming Kong, Xiaoyu Dong and Yuantao Song, School of Engineering Science, University of Chinese Academy of Sciences, Beijing, China

ABSTRACT

Surface defects in the bolts can affect the assembly speed of the assembly line, which in turn affects the work of the entire assembly line. At the same time, defects in bolts may also affect the quality of the product. At present, traditional template matching algorithms can hardly meet the requirements of fast detection and high recognition rate at the same time. In this paper, a defect detection algorithm based on line enhancement is investigated. The method makes specific image enhancement of the matching result when defects and noises co-exist after template matching. It solves the problem that it is difficult to judge defects with small grey-scale differences in traditional algorithms and provides ideas for industrial quality inspection.

KEYWORDS

Machine vision, defect detection, filter design.