The Folio Society is a UK publisher of literature bound in high-end material with custom illustrations. I've been accumulating their books on and off since the 80's and
have been looking for a way to catalog and maintain my collection.
The following list of all Folio Society publications is extracted from the web page at
Library Thing. It's a work in progress....
The Library Thing Folio Society wiki has a lot of work put into it, listing details of all Folio Society books published since the society started in 1947.
It's the most complete list of society publications that I know of.
But there are times when I want to have my own list where I can easily add notes about my collection: prices, wants, needs, etc.
Windows users have a nice MS Access database application
to maintain their collections, but that doesn't help me and other collectors on Apple or Linux devices. So I started down the road of at least
collecting the data in a raw form so others can build on it by writing a program to automatically download and translate the wiki data into a format
that can be used locally.
The following data set is formatted for import into a spreadsheet program. I've tested it with Google Sheets and Apple Numbers and I suspect
that others will be fine since CSV is a simple, common format. Each column in the file is delimited by the "tab" character so if your program
is unable to determine what the delimiter is automatically, you may need to configure it to recognize a tab delimiter.
The list of books at Library Thing is complete, but it was not designed for automated extraction. Each entry was hand-entered by a volunteer who generally followed
a pattern, but there are a lot of variations, incomplete entries and typos. The most difficult problem I have is that there are no rigid standards or field separation.
Fields can be anywhere on a line and can be in arbitrary format. For example, most
entries have the number of pages included, but it can be anywhere on the line for the book. And sometimes, the number of pages is entered like "233pp", but
other times it may be "233 pages" or "233 pp". Similarly, the size of the book is sometimes entered in inches, other times in centimeters. Sometimes
it has three dimensions, other times two.
All these variations make writing a program to extract the data into discrete fields difficult. Right now, the fields in the spread sheet are:
Description (Everything from the wiki)
Number of Pages
There are a number of errors in the data currently due to the difficulty of parsing free-form data. For example "The Sonnets of Michelangelo" is listed as
"Sonnets of Michelangelo, The. Translated" due to the way the data was entered on Library Thing.
My current plan is to not modify the base data set locally, because once I do, I can no longer pull new data when Library Thing is updated or added to. Instead,
simple corrections are being made to the Library Thing wiki and the data is being re-pulled from there.
The version as of 05/June/2021 now includes the
"author" field, but like the "Title" field is should be taken with a grain of salt. In both cases, sometimes the Library Thing entry precludes an easy way to extract
the proper field. In most cases, both fields will be correct, but there are errors. Specifically, I've seen the book's editor, illustrator, etc. be extracted as the author.
Multi-volume, memoirs and autobiographies where the author is assumed, and special editions also cause my extraction algorithm grief.
If you need to use the data elsewhere, manual editing will need to be done in the final local application.
File last updated from Library Thing on June 5, 2021
The complete list of Folio Society Publications CSV File