Learn how to reformat ePUB Public Domain books by splitting HTML files to put in TOC divisions

in #utopian-io6 years ago (edited)

Contribution to the Open Source Project: Calibre

calibrehome2.jpg

Github Repository:

https://github.com/kovidgoyal/calibre

Learn how to use the Calibre software to split HTML files of public domain books to reformat & polish the ePUB book with correct divisions for TOC

What Will I Learn?

5 Major Concepts:

  • Learn to find public domain books using the Gutenberg (http://www.gutenberg.org/) site. There are over 57,000 free public domain ebooks for you to download so that you can read these books with your Calibre software or any eReader you might have.

  • Some public domain books are thick and difficult to navigate inside the ePUB book because they are not properly formatted with TOC. The HTMLs do not match according to new divisions of chapters so you are going to find the right HTML location to split up the chapter divisions.

  • You are going to learn how to split the HTMLs into 2 sections to get the right location to create proper sections for your chapters.

  • You are going to learn how Calibre generates the HTML code for the new split HTML to keep your book consistent in the formatting. This is the magic of Calibre. If you know HTML codes, like me, I can key them in, but even if you don't, you need to know what is happening in the editing screen as you make these changes.

  • You are going to learn how these divisions enable you to have the TOC generated with the .toc.ncx that I taught youin the last tutorial.

  • Finally, to have a better feel of each chapter division starting on the new page, manually put in the page break
    into the HTML section to make your ePUB book even more pleasurable in the reading.

System Requirements

  1. System Requirements: Install Calibre Software 3.23 (updated on May 4, 2018)
  2. OS Support:
  • Windows (Vista, 7, 8 and 10)
  • Linux (32-bit and 64-bit Intel
  • Mac OS X (10.9 Mavericks and higher)

Read the Calibre page and download their software onto your computer.
After download, click execute and start using this software following today's tutorial.

Resources about Calibre:

Difficulty

Intermediate (It is helpful if you can know a little bit of HTML codes)

Tutorial Requirements

Please study the above 2 tutorials because I will not go into explanation on these technical features when I use these features in this tutorial.

Description

There are a lot of public domain books that are freely distributed on the internet. The copyright of these books have expired, so a team of volunteers have digitized them and make them available for public use. In the past, these books only come in PDFs format. But for the last 10 years Project Gutenberg had acquired many volunteers to format these books into ePUB and Kindle mobi format so that people can freely download these books to read them on their devices.

Some of these great literature books are a pleasure to read but the problem is that these books are machine formatted without putting in correct formatting measurement. Often times, the book is sectioned at the wrong place such that a table of content cannot be created. The content of the book is there but the ePUB book is not user friendly at all.

A lot of the great literary work has large volumes and this makes the ePUB book difficult to navigate. What is needed is to split the HTML files into smaller chunks so that you can put the books into right chapter divisions.

So in this tutorial, I will show you how to take a thick 'public domain' book and split them into sections by using the HTML splitting tool. Only when the HTMLs are split in the right location, you will then be able to put in the TOC in the .toc.ncx.

There is quite a few technical points that I need to explain so that you understand the concepts behind each feature. This video is going to be a bit longer than the usual videos because of the needed longer explanation of the concepts that are executed.


Step 1

Download a public domain book from Project Gutenberg site.

gutenberg.jpg

I have downloaded the Thick Bible ePUB version into my computer to use this book as an example of my video demo.

Step 2

Add the thick book onto your Calibre software to see what the ePUB book looks like.

split2.jpg

  • as you can see, the book is unformatted
  • no clear metadata
  • no book cover
  • very difficult to navigate

Step 3

Customize the book with the correct metadata and book cover. I will not teach this in the tutorial as I have covered this steps in previous tutorial. Still I need to go through this step before we move onto the next step.

The result of this step should look like the following:

split3.jpg

Step 4

Use the Calibre Editor feature. Many people do not know that they can edit the book inside the Calibre software. Once you get used to doing this, you will find that it is much easier to edit whatever book you have in Calibre than to do them manually from your source file.

split4.jpg

  • You can either use the Edit Book on the Tool Bar
  • Or you can use the Short Cut: Right click to 'Edit Book'

Step 5

Find the correct HTML to find the division of the new book

split4.jpg

If you don't find the right HTML file, you will make the wrong split.

You want to split the file at the right division of where you want your chapter to end and the next chapter to begin. You need to have the dividing line.

Step 6

Split the file at the 'specified location'. This is important.

split5.jpg

When the dividing line is set, then you can use this tab to instruct the Caliber software to divide the new book here.

  • Click the upper part of the tab to split.
  • The lower part of the tab is to undo the split.

Step 7

Now, click inside the preview panel. This is the step most people do not know, so go slow in this process.

split6.jpg

Please take note that everything has to coincide together on the 3 panels.

HTML coding on the left has to match with the HTML of the content of the book and the Preview of the ePUB book

  • @public@vhost@g@gutenberg@html@files@10@[email protected]

  • <h3 id="pgepubid00003"><a id="The_Second_Book_of_Moses_Called_Exodus"/>

  • The Second Book of Moses Called Exodus

Everything has to coincide here to match in all 3 panels.

Step 8

The file splits ABOVE this location

Why above?

Because that marks the end of a chapter and the beginning of a new chapter.

What does the HTML say?

<hr class="c3"/>

The hr actually marks the horizontal line confirming to us that this is the right division break between the 2 chapters.

Step 9

Look for the Green line in the Preview Section
Click on the Green Line to make the split

Step 10

New codes are now given to the division of the 2 books

This part is a bit technical but if you understand HTML, you will understand why Calibre software is magic. The project owner has already keyed into the software to generate the new codes for the new split HTML file. Each HTML file needs to have special codings at the beginning. We can't just chop the HTML half and expect the new HTML file to run smoothly without the HTML coding.

All ebooks are HTML based so if you know HTML, you can put in the code yourself.

The split HTML file is now called Split 1 as it is split from html3.

@public@vhost@g@gutenberg@html@files@10@[email protected]_split1

And when this split occurs here, Calibre software generates this HTML code so that you can format the the next new chapter correctly:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta name="generator" content="HTML Tidy for Linux (vers 7 December 2008), see www.w3.org"/>

<title>The Project Gutenberg eBook of The Old Testament Of The King James Version Of The Bible.</title>




<link type="text/css" href="0.css" rel="stylesheet"/>
<link type="text/css" href="1.css" rel="stylesheet"/>
<link type="text/css" href="pgepub.css" rel="stylesheet"/>
<meta name="generator" content="Ebookmaker 0.4.0a5 by Marcello Perathoner &lt;[email protected]&gt;"/>
</head>
<body>

Remember to SAVE your work after you have done all the division markings.


Step 11

Create the .toc.ncx

I have taught this in the previous tutorial so I will not teach this concept in this tutorial, but nevertheless, I need to execute this step for step 12.

Step 12 Final Polish

To make the book function even better with the new chapter occurring right at the top of the page, you can do this.

  • Add in <br> before the <h3 id="pgepubid00004">

Video Tutorial


Supplementary Resources:

#1. You can download my ePUB formatted book of the Bible: Click here

#2. Other Resources on Calibre on my Github site:
https://github.com/rosatravels/Calibre


Curriculum:

Please follow the Series of Videos on Calibre:


Thank you for your time and kind attention,

Rosa

Sort:  

Great video tutorial Rosa. Your tutorials are very nicely paced for learners. The production quality is great and your content is delivered in a very learning focused way. Great work.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

To view those questions and the relevant answers related to your post, click here.


Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Hey @rosatravels
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Contributing on Utopian
Learn how to contribute on our website or by watching this tutorial on Youtube.

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.16
TRX 0.13
JST 0.027
BTC 59236.61
ETH 2600.94
USDT 1.00
SBD 2.42