Arabic(Indian) Digits MADBase
@kaggle.hossamahmedsalah_arabicindian_digits_madbase
Loading...
Loading...
Loading...
Loading...
@kaggle.hossamahmedsalah_arabicindian_digits_madbase
This dataset is flattern images where each image is represented in a row
# Define the root directory of the dataset
root_dir = "MAHD"
# Define the names of the folders containing the images
folder_names = ['Part{:02d}'.format(i) for i in range(1, 13)]
# folder_names = ['Part{}'.format(i) if i>9 else 'Part0{}'.format(i) for i in range(1, 13)]
# Define the names of the subfolders containing the training and testing images
train_test_folders = ['MAHDBase_TrainingSet', 'test']
# Initialize an empty list to store the image data and labels
data = []
labels = []
# Loop over the training and testing subfolders in each Part folder
for tt in train_test_folders:
for folder_name in folder_names:
if tt == train_test_folders[1] and folder_name == 'Part03':
break
subfolder_path = os.path.join(root_dir, tt, folder_name)
print(subfolder_path)
print(os.listdir(subfolder_path))
for filename in os.listdir(subfolder_path):
# check of the file fromat that it's an image
if os.path.splitext(filename)[1].lower() not in '.bmp':
continue
# Load the image
img_path = os.path.join(subfolder_path, filename)
img = Image.open(img_path)
# Convert the image to grayscale and flatten it into a 1D array
img_grey = img.convert('L')
img_data = np.array(img_grey).flatten()
# Extract the label from the filename and convert it to an integer
label = int(filename.split('_')[2].replace('digit', '').split('.')[0])
# Add the image data and label to the lists
data.append(img_data)
labels.append(label)
# Convert the image data and labels to a pandas dataframe
df = pd.DataFrame(data)
df['label'] = labels
This dataset made by
https://datacenter.aucegypt.edu/shazeem
with 2 datasets
Share link
Anyone who has the link will be able to view this.