{"id":1453,"date":"2022-05-18T15:03:32","date_gmt":"2022-05-18T19:03:32","guid":{"rendered":"https:\/\/www.grid.ai\/?p=1453"},"modified":"2022-09-10T11:16:20","modified_gmt":"2022-09-10T15:16:20","slug":"creating-datastores","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/","title":{"rendered":"Creating Datastores"},"content":{"rendered":"<h3><strong>Overview of Datastores<\/strong><\/h3>\n<p><span style=\"font-weight: 400;\">To speed up training iteration, you can store your data in a Grid Datastore. Datastores are high-performance, low-latency, versioned datasets. If you have large-scale data, Datastores can resolve blockers in your workflow by eliminating the need to download the large dataset every time your script runs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Datastores can be attached to Runs or Sessions, and they preserve the file format and directory structure of the data used to create them. <\/span><span style=\"font-weight: 400;\">Datastores support any file type, with Grid treating each file as a collection of bytes which exist with a particular name within a directory structure (e.g. <\/span><span style=\"font-weight: 400;\">.\/dir\/some-image.jpg<\/span><span style=\"font-weight: 400;\">).<\/span><\/p>\n<h3><strong>Why Use Datastores?<\/strong><\/h3>\n<p><span style=\"font-weight: 400;\">Data plays a critical role in everything you run on Grid, and our Datastores create a unique optimization pipeline which removes as much latency as possible from the point your program calls <code>with open(filename, 'r') as f:<\/code> to the instant that data is provided to your script. You&#8217;ll find traversing the data directory structure in a Session indistinguishable from the experience of cd-ing around your local workstation.<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Datastores are backed by cloud storage<\/b><span style=\"font-weight: 400;\">. They are made available to compute jobs as part of a read-only filesystem. If you have a script which reads files in a directory structure on your local computer, then the only thing you need to change when running on Grid is the location of the data directory!<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><b>Datastores are a necessity when dealing with data at scale<\/b><span style=\"font-weight: 400;\"> (e.g., data which cannot be reasonably downloaded from an HTTP URL when a compute job begins) by providing a singular &amp; immutable dataset resource of near unlimited scale.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">In fact, a single Datastore can be mounted into tens or hundreds of concurrently running compute jobs in seconds, ensuring that no expensive compute time is wasted waiting for data to download, extract, or otherwise &#8220;process&#8221; before you can move on to the real work.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A couple of notes:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Grid <\/span><b>does<\/b> <b>not<\/b><span style=\"font-weight: 400;\"> charge for data storage.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">In order to ensure data privacy &amp; flexibility of use, Grid never attempts to process the contents of the files or infer\/optimize for any particular usage behaviors based on file contents.<\/span><\/li>\n<\/ol>\n<h3><strong>How Data is Accessed in a Datastore?<\/strong><\/h3>\n<p><span style=\"font-weight: 400;\">By default, Datastores are mounted at <code>\/datastores\/&lt;datastore-name&gt;\/ <\/code>in both Runs and Sessions. If you need the mount path at a different location, you are able to manually specify the Datastore mount path using the CLI.<\/span><\/p>\n<h3><strong>How to Create Datastores<\/strong><\/h3>\n<p><span style=\"font-weight: 400;\">Datastores can be created from a local filesystem, public S3 bucket, HTTP URL, Session, and Cluster.<\/span><\/p>\n<h4><span style=\"font-weight: 400;\">Local Filesystem (i.e. Uploading Files from a Computer)<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">There are a couple of options when uploading from a computer depending on the size of your dataset.<\/span><\/p>\n<p><b>Small Dataset<\/b><\/p>\n<p><span style=\"font-weight: 400;\">You can use the UI to create Datastores for datasets smaller than 1GB (files or folder). When Datastore sizes are greater than 1GB, you\u2019ll reach the browser limit for uploading data. In these situations, you should use the CLI to create Datastores.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the Grid UI, you can create a Datastore by selecting the <\/span><b>New <\/b><span style=\"font-weight: 400;\">button at the top right where you can then choose the <\/span><b>Datastore <\/b><span style=\"font-weight: 400;\">option.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1454 size-full\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore.png\" alt=\"New Datastore\" width=\"2078\" height=\"637\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore.png 2078w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore-300x92.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore-1024x314.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore-768x235.png 768w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore-1536x471.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore-2048x628.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore-1920x589.png 1920w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/New-Datastore-600x184.png 600w\" sizes=\"(max-width: 2078px) 100vw, 2078px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">The Create New Datastore window will open and you will have the following customization options:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Name<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Options to upload a dataset or link using a URL<\/span><\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-1455 size-full\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore.png\" alt=\"Create New Datastore window\" width=\"2074\" height=\"1547\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore.png 2074w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore-300x224.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore-1024x764.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore-768x573.png 768w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore-1536x1146.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore-2048x1528.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore-1920x1432.png 1920w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Create-New-Datastore-600x448.png 600w\" sizes=\"(max-width: 2074px) 100vw, 2074px\" \/><\/p>\n<p><span style=\"font-weight: 400;\">To upload a dataset under 1GB, select the file or folder and click upload, or drag and drop it into the box.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When you have finished with your customizations, select the <\/span><b>Upload<\/b><span style=\"font-weight: 400;\"> button at the bottom right to create your new Datastore.<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1456\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Datastore-upload-small-dataset.gif\" alt=\"Create Datastore from small dataset\" width=\"2864\" height=\"1522\" \/><\/p>\n<p><b>Large Datasets (1 GB+)<\/b><\/p>\n<p><span style=\"font-weight: 400;\">For datasets larger than 1 GB, you should use the CLI (although the CLI can also be used on small datasets just as easily!).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">First, install the grid CLI and login:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">pip install lightning-grid --upgrade<\/span>\r\n<span style=\"font-weight: 400;\">grid login<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">Next, use the `grid datastore` command to upload any folder:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create --name imagenet .\/imagenet_folder\/<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">This method works from:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">A laptop.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">An interactive session.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Any machine with an internet connection and Grid installed.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">A corporate cluster.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">An academic cluster.<\/span><\/li>\n<\/ul>\n<h4><span style=\"font-weight: 400;\">Create from a Public S3 Bucket<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">Any public AWS S3 bucket can be used to create Datastores on the Grid public cloud or on a BYOC (Bring Your Own Credentials) cluster by using the Grid UI or CLI.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Currently, <\/span><span style=\"font-weight: 400;\">Grid does not support private S3 buckets.<\/span><\/p>\n<p><b>Using the UI<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Click New &#8211;&gt; Datastore and choose &#8220;URL&#8221; as the upload mechanism. Provide the S3 bucket URL as the source.<\/span><\/p>\n<p><b>Using the CLI<\/b><\/p>\n<p><span style=\"font-weight: 400;\">In order to use the CLI to create a datastore from an S3 bucket, we simply need to pass an S3 URL in the form s3:\/\/&lt;bucket-name&gt;\/&lt;any-desired-subpaths&gt;\/ to the <code>grid datastore create<\/code> command.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For example, to create a Datastore from the <\/span><i><span style=\"font-weight: 400;\">ryft-public-sample-data\/esRedditJson<\/span><\/i><span style=\"font-weight: 400;\"> bucket we simply execute:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create s3:\/\/ryft-public-sample-data\/esRedditJson\/<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">This will copy the files from the source bucket into the managed Grid Datastore storage system.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">In this example, you&#8217;ll see the <code>--name<\/code> option in the CLI command was omitted. When the <code>--name<\/code> option is omitted, the datastore name is assigned the name of the last &#8220;directory&#8221; making up the source path. So, in the case above, the Datastore would be named &#8220;esredditjson&#8221; (the name is converted to all lowercase ASCII non-space characters).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To use a different name, simply override the implicit naming by passing the <code>--name<\/code> option \/ value parameter explicitly. For example, to create a Datastore from this bucket named &#8220;lightning-train-data&#8221; use the following command to execute:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create s3:\/\/ryft-public-sample-data\/esRedditJson\/ --name lightning-train-data<\/span><\/pre>\n<p><b>Using the &#8211;no-copy Option via the CLI<\/b><\/p>\n<p><span style=\"font-weight: 400;\">In certain cases, your S3 bucket may fit one (or both) of the following criteria:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">the bucket is continually updating with new data which you want included in a Grid Datastore<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">the bucket is particularly large (leading to long Datastore creation times)<\/span><\/li>\n<\/ol>\n<p><span style=\"font-weight: 400;\">In these cases, you can pass the <code>--no-copy<\/code> flag to the <code>grid datastore create<\/code> command.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Example:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create S3:\/\/ruff-public-sample-data\/esRedditJson --no-copy<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">This allows you to directly mount public S3 buckets to a Grid Datastore, without having Grid copy over the entire dataset. This offers better support for large datasets and incremental update use cases.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When using this flag, you cannot remove files from your bucket. If you&#8217;d like to add files, please create a new version of the Datastore after you&#8217;ve added files to your bucket.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you are using this flag via the Grid public cloud, then the source bucket should be in the AWS <\/span><i><span style=\"font-weight: 400;\">us-east-1<\/span><\/i><span style=\"font-weight: 400;\"> region or there will be significant latency when you attempt to access the Datastore files in a Run or Session.<\/span><\/p>\n<h4><span style=\"font-weight: 400;\">Create from an HTTP URL<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">Datastores can be created from a .zip or .tar.gz file accessible at an unauthenticated HTTP URL. By using an HTTP URL pointing to an archive file as the source of a Grid Datastore, the platform will automatically kick off a (server-side) process which downloads the file, extracts the contents, and sets up a Datastore file directory structure matching the extracted contents of the archive.<\/span><\/p>\n<p><b>Using the UI<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Click New &#8211;&gt; Datastore and choose &#8220;URL&#8221; as the upload mechanism. Provide the HTTP URL as the source.<\/span><\/p>\n<p><b>From the CLI<\/b><\/p>\n<p><span style=\"font-weight: 400;\">In order to use the CLI to create a datastore from an HTTP URL, we simply need to pass a URL which begins with either http:\/\/ or https:\/\/ to the <code>grid datastore create<\/code> command.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For example, to create a datastore from the the MNIST training set at: https:\/\/datastore-public-bucket-access-test-bucket.s3.amazonaws.com\/subfolder\/trainingSet.tar.gz we simply execute:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create https:\/\/datastore-public-bucket-access-test-bucket.s3.amazonaws.com\/subfolder\/trainingSet.tar.gz<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">In this example, you&#8217;ll see the <code>--name<\/code> option in the CLI command was omitted. When the <code>--name<\/code> option is omitted, the Datastore name is assigned from the last path component of the URL (with suffixes stripped). In the case above, the Datastore would be named &#8220;trainingset&#8221; (the name is converted to all lowercase ASCII non-space characters).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To use a different name, simply override the implicit naming by passing the <code>--name<\/code> option explicitly. For example, to create a datastore from this bucket named &#8220;lightning-train-data&#8221; use the following command to execute:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create https:\/\/datastore-public-bucket-access-test-bucket.s3.amazonaws.com\/subfolder\/trainingSet.tar.gz --name lightning-train-data<\/span><\/pre>\n<h4><span style=\"font-weight: 400;\">Create from a Session<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">For large datasets that require processing or a lot of manual work, we recommend this flow:<\/span><\/p>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Launch an Interactive Session<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Download the data<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Process it<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Upload<\/span><\/li>\n<\/ol>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-1457\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/05\/Datastore-create-from-session.gif\" alt=\"Create Datastore from Session\" width=\"2874\" height=\"1526\" \/><\/p>\n<p><span style=\"font-weight: 400;\">When you are in the interactive Session, use the terminal multiplexer Screen to make sure you don&#8217;t interrupt your upload session if your local machine is shut down or experiences network interruptions.<\/span><\/p>\n<pre><span style=\"font-weight: 400;\"># start screen (lets you close the tab without killing the process)<\/span>\r\n<span style=\"font-weight: 400;\">screen -S some_name<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">Now do whatever processing you need:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\"># download, etc...<\/span>\r\n<span style=\"font-weight: 400;\">curl http:\/\/a_dataset<\/span>\r\n<span style=\"font-weight: 400;\">unzip a_dataset<\/span>\r\n\r\n<span style=\"font-weight: 400;\"># process<\/span>\r\n<span style=\"font-weight: 400;\">do_something<\/span>\r\n<span style=\"font-weight: 400;\">something_else<\/span>\r\n<span style=\"font-weight: 400;\">bash process.sh<\/span>\r\n<span style=\"font-weight: 400;\">...<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">When you&#8217;re done, upload to Grid via the CLI (on the Interactive Session):<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create imagenet_folder --name imagenet<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">The Grid CLI is auto-installed on sessions and you are automatically logged in with your Grid credentials.<\/span><\/p>\n<p><i><span style=\"font-weight: 400;\">Note: If you have a Datastore that is over 1GB, we suggest creating an Interactive Session and uploading the Datastore from there. Internet speed is much faster in Interactive Sessions, so upload times will be shorter.<\/span><\/i><\/p>\n<h4><span style=\"font-weight: 400;\">Create from a Cluster<\/span><\/h4>\n<p><span style=\"font-weight: 400;\">Grid also allows you to upload from:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">A corporate cluster.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">An academic cluster.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">First, start <code>screen<\/code> on the jump node (to run jobs in the background):<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">screen -S upload<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">If your jump node allows a memory-intensive process, then skip this step. Otherwise, request an interactive machine. Here&#8217;s an example using SLURM:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">srun --qos=batch --mem-per-cpu=10000 --ntasks=4 --time=12:00:00 --pty bash<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">Once the job starts, install and log into Grid (get your username and ssh keys from the Grid Settings page).<\/span><\/p>\n<pre><span style=\"font-weight: 400;\"># install<\/span>\r\n<span style=\"font-weight: 400;\">pip install lightning-grid --upgrade<\/span>\r\n\r\n<span style=\"font-weight: 400;\"># login<\/span>\r\n<span style=\"font-weight: 400;\">grid login --username YOUR_USERNAME --key YOUR_KEY<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">Next, use the Datastores command to upload any folder:<\/span><\/p>\n<pre><span style=\"font-weight: 400;\">grid datastore create .\/imagenet_folder\/ --name imagenet<\/span><\/pre>\n<p><span style=\"font-weight: 400;\">You can now safely close your SSH connection to the cluster (the screen will keep things running in the background).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">And that\u2019s it for creating Datastores in Grid! You can check out other Grid tutorials, or browse the <\/span><a href=\"https:\/\/docs.grid.ai\/?_ga=2.248257472.699179390.1651686141-1382253488.1644426366\"><span style=\"font-weight: 400;\"><u>Grid Docs<\/u><\/span><\/a><span style=\"font-weight: 400;\"> to learn more about anything not covered in this tutorial.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As always, Happy Grid-ing!<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview of Datastores To speed up training iteration, you can store your data in a Grid Datastore. Datastores are high-performance, low-latency, versioned datasets. If you have large-scale data, Datastores can resolve blockers in your workflow by eliminating the need to download the large dataset every time your script runs. Datastores can be attached to Runs<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\" title=\"ReadCreating Datastores\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":1352,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[103],"tags":[43],"glossary":[],"acf":{"hide_from_archive":null,"content_type":null,"code_embed":null,"code_shortcode":null,"custom_styles":null,"sticky":null,"additional_authors":null,"mathjax":null,"default_editor":null,"sections":null,"show_table_of_contents":null,"table_of_contents":null,"tabs":null,"tab_group":null},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Creating Datastores - Lightning AI<\/title>\n<meta name=\"description\" content=\"Follow this tutorial to learn how to create Datastores in Grid from a local filesystem, public S3 bucket, HTTP URL, Session, or Cluster.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Creating Datastores - Lightning AI\" \/>\n<meta property=\"og:description\" content=\"Follow this tutorial to learn how to create Datastores in Grid from a local filesystem, public S3 bucket, HTTP URL, Session, or Cluster.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2022-05-18T19:03:32+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-09-10T15:16:20+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1280\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"Creating Datastores\",\"datePublished\":\"2022-05-18T19:03:32+00:00\",\"dateModified\":\"2022-09-10T15:16:20+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\"},\"wordCount\":1552,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png\",\"keywords\":[\"Grid\"],\"articleSection\":[\"Technical Documentation\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\",\"name\":\"Creating Datastores - Lightning AI\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png\",\"datePublished\":\"2022-05-18T19:03:32+00:00\",\"dateModified\":\"2022-09-10T15:16:20+00:00\",\"description\":\"Follow this tutorial to learn how to create Datastores in Grid from a local filesystem, public S3 bucket, HTTP URL, Session, or Cluster.\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png\",\"width\":2560,\"height\":1280,\"caption\":\"Grid Tutorial: Creating Datastores\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Creating Datastores\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Creating Datastores - Lightning AI","description":"Follow this tutorial to learn how to create Datastores in Grid from a local filesystem, public S3 bucket, HTTP URL, Session, or Cluster.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/","og_locale":"en_US","og_type":"article","og_title":"Creating Datastores - Lightning AI","og_description":"Follow this tutorial to learn how to create Datastores in Grid from a local filesystem, public S3 bucket, HTTP URL, Session, or Cluster.","og_url":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/","og_site_name":"Lightning AI","article_published_time":"2022-05-18T19:03:32+00:00","article_modified_time":"2022-09-10T15:16:20+00:00","og_image":[{"width":2560,"height":1280,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"Creating Datastores","datePublished":"2022-05-18T19:03:32+00:00","dateModified":"2022-09-10T15:16:20+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/"},"wordCount":1552,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png","keywords":["Grid"],"articleSection":["Technical Documentation"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/","url":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/","name":"Creating Datastores - Lightning AI","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png","datePublished":"2022-05-18T19:03:32+00:00","dateModified":"2022-09-10T15:16:20+00:00","description":"Follow this tutorial to learn how to create Datastores in Grid from a local filesystem, public S3 bucket, HTTP URL, Session, or Cluster.","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2022\/03\/dbf0a58a-fbe7-4ecf-bf51-c65f90243f9a.png","width":2560,"height":1280,"caption":"Grid Tutorial: Creating Datastores"},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/technical-documentation\/creating-datastores\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Creating Datastores"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/1453"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=1453"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/1453\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/1352"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=1453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=1453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=1453"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=1453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}