This chapter describes Extractor plug-ins, their architecture and components, and provides step-by-step instructions on how to create an Extension Plug-ins for your file type.
Overview
Extractor plug-ins are libraries (ALP bundles) which, once registered with the Media Cataloger, extract metadata from the type of media file for which they are written. These plug-ins do the work of populating the metadata databases with the appropriate information for the file type.
If you provide a new file type for an application—a browser, a viewer, a player—you should create an Extractor to provide metadata for your file type. This allows Media Selector to provide more information to improve users' search for files.
Extractor Architecture
The Media Cataloger automatically registers and de-registers Extractors from its internal list when they are installed or removed from the system. The Cataloger watches for notifications from the Bundle Manager and inspects the contents of the Extractor's Manifest. Registration is a matter of describing the supported file extension(s) and mime type(s) in the bundle manifest.
The Cataloger does not need an extractor to display files in the Media Selector. All it requires is a registered extension, that is, a a <register_mime> manifest entry. Without an extractor, Media Selector displays a minimum amount of information—the file's name, mtime, and size.
While ACCESS Linux Platform ships standard Extractors to read metadata from common file types and a default Extractor, anyone who writes a viewer for a certain type of file format or codec should provide an extractor for that file format.
You can register any number of Extractors per mime type, so you can write Extractors to extract a large or small amount of information, for example, an Extractor that looks for a particular revision of a format and reads a single piece of metadata. Listing a mimetype and extension in a bundle manifest registers it with the Media Cataloger subsystem.
While Extractors generally do run in the order their bundles are processed, and in the order that they are mentioned within their bundle's manifest, this order is not guaranteed. Do not write Extractors that rely on other Extractors or the Media Selector's default extractor running before or after them.
When the Cataloger encounters a file of an unrecognized type, it stores the file's name, size, and modification time, and adds the file ID to the OTHER category. These files are not associated with an Extractor.
Designing Extractors
Your Extractor can be either a stand alone bundle, or part of an existing application. The elements that define an Extractor are the entries in the manifest and the implementation of the entrypoints in the code.
For details of extractor code, examine the sample mp3 extractor provided in /samples/mp3_extractor.
As you design your Extractor, keep in mind the following key points:
- Extractors provide 2 entry points, for extraction, and optionally, thumbnailing:
extern "C" alp_status_t alp_extractor_process_file(AlpDmlH _handle, const char* _file); extern "C" alp_status_t alp_extractor_generate_thumbnail(const char* _file, const char* _savepath, int32_t _width, int32_t _height);
IMPORTANT: Your Extractor must never hang when the entrypoints are called, as it causes the whole extractor system to stop processing.
- Extractors are passed the name and path of a file with an extension that the Extractor's manifest says it can handle. The Extractor should try to extract the relevant meta data as quickly as possible—just reading the minimum number of header blocks as necessary.
- After extracting the meta data from the file, use the Media Selector DML API to insert the metadata into the database.
- Save thumbnails as image files that the system can natively understand, preferably PNG or JPEG files.
- Do not depend on the order in which the Cataloger runs Extractors.
- Return
alp_extractor_process_fileresult codes when DML errors, especially DMLcommiterrors, occur, but do not return "can't read metadata errors." Any error forces reextraction. Return any significant DML error codes, but do not return an error code just because you could not find any metadata in a file.
Metadata Categories
Your Extractor stores metadata through the Media Selector Data Model (DML). Category and column names in the DML follow standard program identifier rules:
- The letters must be drawn from the 7-bit ASCII character set.
- The first character must be a letter or an underscore, while subsequent characters can include numbers.
- Names cannot include spaces, punctuation, or any international characters:
Due to Bundle Manager requirements, the category_columns tag contains more structure than XML standards suggest. The syntax is similar to SQL: a comma separated stream of name/type pairs.
There are three basic data-types: string, int32, and int64. The Media Selector supports a number of aliases for these data types, in an attempt to make category customizing easy for programmers from a variety of backgrounds.
The DML API is case-insensitive. Your manifest can create a category named MY_CUSTOM_CATEGORY while your code refers to My_Custom_Category or my_custom_category. Similarly, you can refer to a column named my_custom_column as My_Custom_Column, MY_CUSTOM_COLUMN, or any similar variation.
Extractor Components
Extractor Plug-in components are defined in two different sections:
Extractor Manifest
An Extractor must have a manifest in order to install itself correctly.
An example manifest for a basic MP3 Extractor follows:
<manifest name="com.access.extractors.sample_extractor"> <extractor> <library>libalp_sample_extractor.so</library> </extractor> <extractor_for> <extension>mp3</extension> <mimetype>audio/mpeg3</mimetype> </extractor_for> <thumbnailer_for> <extension>jpg</extension> <mimetype>image/jpeg</mimetype> </thumbnailer_for> <register_mime> <mimetype>audio/mp4</mimetype> <extension>mp4</extension> </register_mime> </manifest>
Manifest Name Section
As with any bundle, the Extractor begins with a manifest name="" section. Make absolutely sure that this matches the name in your Jamfile.
<manifest name="com.access.extractors.sample_extractor">
The <extractor> entry tells the cataloger that your bundle contains an Extractor. The Cataloger proceeds with registration of the entries that follow. So far, the <extractor> section only has one entry: <library>. This tells the Cataloger which library has the entry points for the Extractor:
<extractor> <library>libalp_sample_extractor.so</library> </extractor>
IMPORTANT: It is critical that this is the same name that your Makefile or Jamfile generates for your library.
Type Registration
The following sections actually register your Extractor, so the Cataloger calls it to read specific file types.
extractor_for
Each <extractor_for> entry tells the Cataloger to call your Extractor for a specific file/type, for example, on mp3 files:
<extractor_for> <extension>mp3</extension> <mimetype>audio/mpeg3</mimetype> </extractor_for>
The <extension> tag matches the file's extensions with the <mimetype> tag in order to register the type in the Cataloger extractor list. You must have both of these entries in any <*_for> type of entry.
thumbnailer_for
Each <thumbnailer_for> entry tells the Cataloger that your extractor exports the thumbnail entry. The Cataloger calls the Extractor to create a thumbnail for a file when an application requests it. (Most often, the Media Selector widgets use this.)
The syntax is identical to that of the <extractor_for> entry. For example, the following statement tells the system that the Extractor creates thumbnails for .jpg files:
<thumbnailer_for> <extension>jpg</extension> <mimetype>image/jpeg</mimetype> </thumbnailer_for>
register_mime
In some cases, your application needs to display a new type in the Media Selector, but it does not have any additional information about the type to add to the metadata database. Use the <register_mime> entry to tell the Cataloger that a new type, for example, mp4, should appear as an audio type:
#!xml <register_mime> <mimetype>audio/mp4</mimetype> <extension>mp4</extension> </register_mime>
Preferred MIME Mapping
The system supports multiple extensions per MIME type, and there can be multiple MIME types per extension.
Multiple MIME types for an extension causes problems for applications that use the following functions to retrieve MIME type for a file:
-
alp_mf_get_mime_for_file -
alp_mf_mime_get_supertype -
alp_mf_mime_get_subtype -
alp_mf_mime_get_mime_string
The MIME type defines and labels the type of content in a file. That label allows applications to determine the appropriate action to take with a file. For example, the MIME type of an email attachment informs the receiving application about the type of content. Based on that information, the receiver can process the attachment. To resolve this problem, the manifest supports the concept of a '''preferred''' mapping from extension to MIME. The functions use this mapping to pick the MIME type when more than one is available. (msutil -m reports all extension <-> MIME mappings, starring the preferred ones.)
To mark a mapping as "preferred", add the empty XML tag <preferred/> to an extractor_for, thumbnailer_for, or register_mime section. For example:
#!xml <register_mime> <mimetype>application/pdf</mimetype> <extension>pdf</extension> <preferred/> </register_mime>
The default for all mappings is non-preferred, or "extra." Mappings are only marked as preferred if a manifest contains a <preferred/> tag. The system makes no effort to validate <preferred/> tags. If there are multiple preferred mappings, the system chooses the newest of the preferred mappings. If there is no preferred mapping, the system chooses the newest of the ordinary mappings.
Category Extension
If your file type needs to store additional metadata, you can extend the metadata schema for your type by including additional tags in your extractor manifest. You then use the Media Selector DML API to put data into your custom categories and custom columns in exactly the same way that you populate factory-standard data entries in the metadata database.
You can extend the Media Selector metadata schema in any bundle by including an <extractor_category> section. For example, the following creates a new table, audio_collection:
#!xml <extractor_category> <category_name>audio_collection</category_name> <category_columns>title CHAR, count INTEGER </category_columns> </extractor_category>
At factory reset, or when the bundle is added to the system, the Cataloger creates a new DML category, audio_collection, with the columns title and count. These custom entries are normal database entries that you can write with the Extractor Entrypoint alp_ms_dml_item_set_* API and read with the alp_ms_dml_statement_* API. While there is some modest cost involved with extending the schema, accessing custom items should be no slower than accessing standard entries.
Category extension can also extend existing tables, whether standard or created by another bundle. For example:
<extractor_category> <category_name>audio</category_name> <category_columns>composer CHAR</category_columns> </extractor_category>
This adds a "composer" string field to the standard "audio" table.
Extractor Entrypoints
The Extractor entrypoints exist in the Extractor Library (DSO) provided by the licensee or third party for a given set of media types. The entry points have two forms:
Extractors should return 0 (ALP_STATUS_OK) unless they encounter some severe failure, like a non-zero result from alp_ms_dml_commit_item. Returning anything other than ALP_STATUS_OK is comparatively expensive in terms of performance. The Cataloger Extractor Host treats it as a non-recoverable error. (If the process fails, there is no guarantee that any of the file data was written.) This result forces the Cataloger to reload your Extractor in a new process, and repeat any successful extractions (of the same file type). When you design your entrypoints, note the following:
- Return any significant DML error codes, as these indicate a serious failure.
- Do not return an error code just because you could not find any metadata in a file. (Your extractor simply returns without posting any metadata - the file has no metadata to display and/or search for.)
Metadata Extractor Entrypoint
Extractors which add metadata to the Media Cataloger database must supply the following as an entrypoint.
alp_status_t alp_extractor_process_file(AlpDmlH _handle, const char* _file)
This API is for the general extraction case. It uses the following arguments:
Table 9.3 Metadata Extractor Entrypoint function arguments
The following sample code shows how to use the parameters to process an mp3 filetype:
// ------------------------------------------------------------------------- extern "C" alp_status_t alp_extractor_process_file(AlpDmlH _handle, const char* _file) { fprintf(stderr, "alp_extractor_process_file: Processing %s on %p\n", _file, _handle); if( NULL != strstr(_file, ".jpg") ) { return supplement(_handle, _file); } ID3_Tag myTag(_file); ID3_Tag::Iterator* iter = myTag.CreateIterator(); ID3_Frame* frame = NULL; alp_status_t result; AlpDmlItemH item; if( ALP_STATUS_OK == (result = alp_ms_dml_item_create(ALP_MS_ITEM_TYPE_AUDIO, &item, _file)) ) { while (NULL != (frame = iter->GetNext())) { switch( frame->GetID() ) { case ID3FID_LEADARTIST: alp_ms_dml_item_set_string(item, ALP_MS_DML_COL_AUDIO_ARTIST, frame->GetField(ID3FN_TEXT)->GetRawText()); break; case ID3FID_TITLE: alp_ms_dml_item_set_string(item, ALP_MS_DML_COL_AUDIO_TITLE, frame- >GetField(ID3FN_TEXT)->GetRawText()); break; case ID3FID_ALBUM: alp_ms_dml_item_set_string(item, ALP_MS_DML_COL_AUDIO_ALBUM, frame- >GetField(ID3FN_TEXT)->GetRawText()); break; case ID3FID_YEAR: { int32 year = atoi(frame->GetField(ID3FN_TEXT)->GetRawText()); alp_ms_dml_item_set_int32(item, ALP_MS_DML_COL_AUDIO_YEAR, &year); break; } case ID3FID_COMMENT: alp_ms_dml_item_set_string(item, ALP_MS_DML_COL_AUDIO_NOTES, frame- >GetField(ID3FN_TEXT)->GetRawText()); break; case ID3FID_CONTENTTYPE: alp_ms_dml_item_set_string(item, ALP_MS_DML_COL_AUDIO_GENRE, frame- >GetField(ID3FN_TEXT)->GetRawText()); break; default: break; } } result = alp_ms_dml_commit_item(_handle, item); alp_ms_dml_item_destroy(item); } else { fprintf(stderr, "Error creating: %s\n", _file); // Error, unable to create item; } delete iter; return result; }
Thumbnailer Extractor Entrypoint
Extractors which do thumbnailing must supply the following as an entrypoint.
alp_status_t alp_extractor_generate_thumbnail(const char* _file, const char* _savepath, int32_t _width, int32_t _height)
This API is for Extractors which have indicated, in the manifest, that they can generate a thumbnail or preview to display to the user.
Table 9.4 Thumbnailer Entrypoint Function Arguments
It is not necessary to generate a thumbnail to display an icon, for example, an .mp3 or .mpg file—the system selects stock icons automatically. You can, however, use the thumbnail API to load album art or artist images, if they were installed with the media. Save thumbnails as image files that the system can natively understand, preferably PNG or JPEG files.
The following sample code shows how to use GDK for loading and creating thumbnails:
// -------------------------------------------------------------------------- // Use GDK to create the thumbnails for images. // -------------------------------------------------------------------------- extern "C" alp_status_t alp_extractor_generate_thumbnail(const char* _file, const char* _savepath, int32_t _width, int32_t _height) { alp_status_t result = ALP_STATUS_OK; GError* error = NULL; gdk_init(NULL, NULL); int image_width = _width; int image_height = _height; gint file_width, file_height; bool got_image_size = NULL != gdk_pixbuf_get_file_info(_file, &file_width, &file_height); if (got_image_size) { if (file_width > file_height) { image_height = -1; // make _width wide, preserving aspect ratio } else { image_width = -1; // make _height high, preserving aspect ratio } } GdkPixbuf* scaled = gdk_pixbuf_new_from_file_at_scale(_file, image_width, image_height, got_image_size, &error); GdkPixbuf* source = scaled; int scaled_height = gdk_pixbuf_get_height(scaled); int scaled_width = gdk_pixbuf_get_width(scaled); if (scaled_width < _width || scaled_height < _height) { // Create a width-x-height transparent pixbuf, // and center scaled on that source = gdk_pixbuf_new(GDK_COLORSPACE_RGB, true, 8, _width, _height); // param 3 is "bits per color sample", not "bits per pixel" if (NULL != source) { gdk_pixbuf_fill(source, 0); // transparent gdk_pixbuf_copy_area(scaled, 0, 0, scaled_width, scaled_height, source, (_width - scaled_width) >> 1, (_height - scaled_height) 1); gdk_pixbuf_unref(scaled); } else { fprintf(stderr, "Couldn't creating border pixmap.\n"); } } gdk_pixbuf_get_height(source), _width, _height); if( NULL == source ) { result = error->code; } else { gdk_pixbuf_save(source, _savepath, "png", &error, NULL); if( NULL != error ) { result = error->code; } gdk_pixbuf_unref(source); } if( error ) { g_error_free(error); } return result; }
Creating an Extractor
This section guides you through the minimum steps to write your Extractor.You need to create the following components:
- Manifest, described in "Extractor Manifest" with instructions in "Creating the Manifest."
- Code with entrypoints, described in "Extractor Entrypoints" with instructions in "Creating the Code."
- Makefile
The Makefile is a standard makefile. For an example see the Makefile in
samples/extractor/Makefile.
When you've completed your extractor, test it.
Creating the Manifest
The manifest entries provide information to the Cataloger to register your extractor.
Use the following steps to constructs the elements of your manifest:
- Enter the manifest name.
As with any bundle, there is a
manifest name=""section at the top. Make absolutely sure that this name matches the name in your Jamfile.
<manifest name="com.access.extractors.sample_extractor">
- Set up the extractor entry.
The
<extractor>entry tells the Cataloger to recognize and register the entries that follow. The<library>entry indicates which library has the entry points for the Extractor:<extractor> <library>libalp_sample_extractor.so</library> </extractor>
IMPORTANT: It is critical that this is the same name that your Makefile or Jamfile generates for your library.
- Specify types for registration
The following sections actually register your Extractor, so the Cataloger calls it to read specific file types.
At the time of this release, each entry can only specify one type.
The
<extension>tag matches the file's extensions with the<mimetype>tag in order to register the type in the Cataloger database.
- extractor_for
Each
<extractor_for>entry tells the Cataloger that it should call your Extractor for a specific file/type. For example, this entry tells the extractor to be run on.mp3files:<extractor_for> <extension>mp3</extension> <mimetype>audio/mpeg3</mimetype> </extractor_for>
- thumbnailer_for
Each
<thumbnailer_for>entry tells the Cataloger that your extractor exports the thumbnail entry. The Cataloger calls it to create a thumbnail for a file when an application requests it.The syntax is identical to that of the
<extractor_for>entry. For example, this entry tells the Cataloger that the extractor creates thumbnails for.jpgfiles:<thumbnailer_for> <extension>jpg</extension> <mimetype>image/jpeg</mimetype> </thumbnailer_for>
- register_mime
Each <register_mime> associates a new extension in the Media Selector, with no additional metadata information, to a type:
For example, this entry tells the Cataloger to display files with the extension
.mp4as an "audio" type.<register_mime> <mimetype>audio/mp4</mimetype> <extension>mp4</extension> </register_mime>
- extractor_category (optional)
You can extend the Media Selector metadata schema to extend an existing category or add an additional columns to an existing category by including an
<extractor_category>section. For example, the following creates a new category table,audio_collection, and its columns:<extractor_category> <category_name>audio_collection</category_name> <category_columns>title CHAR, count INTEGER</category_columns> </extractor_category>
Creating the Code
This section lists steps to write the minimal code for an extractor.
- Include the mediaselector API header,
mediaselector_api.h:#include <alp/mediaselector_api.h>
This file defines all the types and functions you need to add metadata to the Cataloger.
- Create the extractor entrypoint
This callback entrypoint (function) is called when the Cataloger has a file to process:
extern "C" alp_status_t alp_extractor_process_file(AlpDmlH _handle, const char* _file) { ... }
This following function does all the work. It uses the data model API to discover the metadata for the Cataloger metadata database:
alp_status_t result = ALP_STATUS_OK; AlpDmlItemH item; if( ALP_STATUS_OK == (result = alp_ms_dml_item_create(ALP_MS_ITEM_TYPE_AUDIO, &item, _file)) ) { const char* data_from_file = get_artist_data_from_file(_file); if( NULL != data_from_file ) { alp_ms_dml_item_set_string(item, ALP_MS_DML_COL_AUDIO_ARTIST, data_from_file); } ...
The sample code use the open source id3 lib to retrieve all the mp3 information from a file.
- Commit the information and free the resource.
When you finish populating information in the dml item, commit the data, and destroy the item to free its resources.
... result = alp_ms_dml_commit_item(_handle, item); alp_ms_dml_item_destroy(item); } return result;
IMPORTANT: Always remember to return the result of any important methods, like the commit() function, to the caller. Return any significant DML error codes, but do not return an error code just because you could not find any metadata in a file.
- Create the thumbnail extractor entrypoint.
If your bundle can create thumbnails/previews of files, you have created a
<thumbnail_for>section in your manifest. Export the following entrypoint function:extern "C" alp_status_t alp_extractor_generate_thumbnail(const char* _file, const char* _savepath, int32_t _width, int32_t _height) { ... }
Use the code provided in "Thumbnailer Extractor Entrypoint" as an example.
Testing Your Extractor
Once you have created your Extractor, test it to verify that it is working correctly.
You can use the msutil tool to check the following:
-
msutil -elists the registered Extractors. Is your Extractor registered? -
msutil -mlists the current mimetypes. Is your mimetype supported?
You can write an application to test your Extractor Plug-in. Use the DML API to return values stored in the appropriate columns. Examine the values to see if your Extractor has stored the correct metadata.










