Soluții

Google Docs can now import and export Markdown

Google has announced that you can now convert content in Google Docs to and from Markdown. If you can’t do this yet, don’t fret—this feature will roll out from July 16 to all Rapid Release and Scheduled Release users, before ultimately landing in all Workspace and personal Google accounts.

Markdown was created in 2004 as a more user-friendly web page text formatting syntax than HyperText Markup Language (HTML). It uses hashtags, asterisks, and underscores (instead of hard-to-read code inside angle brackets) to add heading styles, bullet points, italics, and bold to text. This creation means that anyone can try their hand at technical content writing, as Markdown knocks down many (but not all!) of the complication barriers caused by HTML.

Now, to supplement its 2022 introduction of expanded support for creating content with Markdown on Docs, Google is offering users the ability to convert Docs content to and from Markdown.

This update includes the ability to:

  1. Export a Google Doc as Markdown syntax (through File > Download)
  2. Import Markdown syntax as a Google Doc (through File > Open or by clicking “Open with Google Docs” from Google Drive)
  3. Convert Markdown syntax to Google Docs content as an option when pasting the text
  4. Convert Google Docs content to Markdown syntax when copying the text

You don’t need to make any changes to start making the most of this upgrade. The import and export options (points 1 and 2 above) are enabled by default, though using the copy and pasting options (points 3 and 4 above) requires users to activate this option by clicking “Tools,” selecting “Preferences,” and checking “Enable Markdown”.

This much-demanded addition to Google Docs will come as a blessing to tech writers and engineers, new and old, as it facilitates better integration with other Markdown tools. More specifically, because Google Docs is well-known for easy and instant collaboration, this will encourage developers to use the program to work together before exporting their content as Markdown, smoothing the entire writing process significantly.

A logical next step would be for Google to enable users to view and edit Markdown files directly in Google Drive. Only time will tell if this comes to fruition.

[mai mult...]

Tasks you can automate using Regex

Regex (short for REGular EXpressions) makes it easy to clean up and standardize. While it’s versatile as an editing tool, it’s it has its limits in application. The best way to think about regex is as a super-powerful wildcard search or search-and-replace that you can use whenever you need it—where it’s supported, of course. Microsoft Excel recently started supporting regex, making it a useful skill to learn.

We’ve all been there: copying across some stuff from a PDF and then pasting it to your own document, only to have weird spacing and artifacts come across with the copy. But did you know that regex could help with that? Enter this command into your find-and-replace function:

Find: [^\S\r\n]{2,}|\s*\r?\n\s*\r?\n\s*
Replace: \n

This regex will make your text editor:

  • Remove any instance of multiple spaces
  • Reduces multiple line breaks to a single line break
  • Gets rid of trailing spaces

This should clean up whatever copied text you have into something that’s useful.

Bulk-Renaming Downloaded Files

In more than one unfortunate incident, I downloaded a set of files, and they came with odd names appended. If you have a bulk-rename tool like Advanced Renamer, you can use a regex to clean up those filenames into something more recognizable. If you have a series of files with symbols all over the place, you can use your renamer with this regex:

Find: [^a-zA-Z0-9-.]
Replace:

This keeps numbers, periods, and letters as they are but replaces everything else with dashes.

Currency Formatting

Let’s say you have a file with a ton of currency in different formats. You don’t want to manually go through each of those currencies and fix it to the format you want, especially if they’re in multiple weird formats. Here’s what you’ll use for your regex:

Find: \$?\s*(\d+(?:\,\d{3})*(?:\.\d{2})?)\s*(?:USD|dollars?)?
Replace: $\1

This regex will scrub through your currency file and clean up anything to give you a dollar sign, a currency entry, and two decimal places for cents.

Standardize Date Formats

I’ve been in several situations where I’ve had to extract dates into a standardized format, like moving them from text into a database. When you’re faced with something like this, you can use a regex to find and extract data into a simplified format (in this case, it should be in YYYY-MM-DD). The regex for this would be:

Find: \b(?:\d{1,2}[-/\.]\d{1,2}[-/\.]\d{2,4})|(?:(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*[\s.-]?\d{1,2}(?:st|nd|rd|th)?[\s,.-]?\d{2,4})\b

This should search the entire document and fix all the dates to this standardized format.

Strip HTML Tags

When you copy something from the web, it sometimes has HTML tags attached to it. Luckily, regex has a handy method for stripping HTML tags from a document:

Find: <[^>]+>|&[^;]+;|\s*\n\s*
Replace: \n

This regex will sift through the document, find the HTML tags, and wipe them, along with extra line breaks and other HTML entities (like ‘&’). Now, you can easily clean up a document like this by simply looking for the tags and replacing them with something empty with this regex:

Find: <[^>]+>
Replace: (empty)

However, if your document uses weird HTML tags, formatting, or entities, you might encounter problems. The first regex is a general cleaning, and the second one is more in-depth in its searching.

Extract URLs from Text

Sometimes, you have a document with URLs buried inside the text. Pulling out those URLs shouldn’t require a manual search through the entire document, and regex will save you time. We already know that URLs always start with http or https, and we can use that knowledge in our regex:

Find: (https?:\/\/)?([\w\-]+(\.[\w\-]+)+\.?(:\d+)?(\/\S*)?)|((www\.)?[\w\-]+(\.[\w\-]+)+\.?(:\d+)?(\/\S*)?)

While this extractor will find your URLs, it has a few issues. If you have malformed URLs, or anything without http or https prefixes, you won’t see the URL. You won’t get emails with this pattern either, but there’s another one that you can use specifically for emails.

One of the most common problems I encounter when doing data scraping or email list validation is getting emails from a text file. Emails typically have a pattern that makes it easy for regex to interact with the text file. For an email search function, we’ll do something like this:

Find: (?:[a-z0-9!#$%&’*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&’*+/=?^_`{|}~-]+)*|”(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*”)@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

This might look like a lot, but it basically searches for anything that has the pattern <something>@<domain>.com. As robust as the regex is, it’s not perfect. It will miss any emails that start with a dot, and anything that isn’t properly formatted as an email will also be ignored.

Format Social Media Handles

When moving data from a form to or from a database, you sometimes have to fix some formatting problems. An excellent case in point is social media handles. The regex for doing this is:

Find: (?:^|[^@\w])[@\s]*(\w{1,30})
Replace: @$1

This is the most robust use case for formatting social media handles, but each platform has its own nuances for usernames. You can’t write error-checking for those specific handles unless you use this regex in a Python script, for example. Even so, debugging your Python code with regex might be a bit more complicated.

There’s a saying that if you have a hammer, every problem starts to look like a nail. As an experienced coder, I can say that’s 100% true regarding regex. There are several places online that can help you learn regex, but you should use this knowledge sparingly.

Including a regex in your code can complicate your debugging process. They don’t lend themselves to commenting either, making it more difficult to share code with others. Finally, they are part of an automation system; if the source data is bad, the results will also be bad. While regex is a powerful tool, it’s better for some things than for others.

[mai mult...]