{"id":5648177,"date":"2023-05-24T13:18:16","date_gmt":"2023-05-24T17:18:16","guid":{"rendered":"https:\/\/lightning.ai\/pages\/?p=5648177"},"modified":"2023-05-24T13:20:42","modified_gmt":"2023-05-24T17:20:42","slug":"lang-segment-anything-object-detection-and-segmentation-with-text-prompt","status":"publish","type":"post","link":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/","title":{"rendered":"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt"},"content":{"rendered":"<div class=\"takeaways card-glow p-4 my-4\"><h3 class=\"w-100 d-block\">Takeaways<\/h3> Our community member Luca Medeiros shows how he leveraged the Segment Anything Model from Meta AI and built his <a href=\"https:\/\/github.com\/luca-medeiros\/lang-segment-anything\">lang-segment-anything<\/a> library for object detection and image segmentation based on text prompts.<\/div>\n<h2><strong>Segment Anything Model (SAM)<\/strong><\/h2>\n<p>In recent years, computer vision has witnessed remarkable advancements, particularly in image segmentation and object detection tasks. One of the most recent notable breakthroughs is the Segment Anything Model (SAM), a versatile deep-learning model designed to predict object masks from images and input prompts efficiently. By utilizing powerful encoders and decoders, SAM is capable of handling a wide range of segmentation tasks, making it a valuable tool for researchers and developers alike.<\/p>\n<p>SAM employs an image encoder, typically a Vision Transformer (ViT), to extract image embeddings that serve as a foundation for mask prediction. The model also incorporates a prompt encoder, which encodes various types of input prompts, such as point coordinates, bounding boxes, and low-resolution mask inputs. These encoded prompts, along with image embeddings, are then fed into a mask decoder to generate the final object masks.<\/p>\n<div id=\"attachment_5648178\" style=\"width: 437px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5648178\" class=\" wp-image-5648178\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/section-3.1c.gif\" alt=\"\" width=\"427\" height=\"427\" \/><p id=\"caption-attachment-5648178\" class=\"wp-caption-text\">Source: Section 3.1c https:\/\/segment-anything.com\/<\/p><\/div>\n<p>The above architecture allows fast and light prompting on an already encoded image.<\/p>\n<p>SAM is designed to work with a variety of prompts, including:<\/p>\n<ul>\n<li>Mask: A rough, low-resolution binary mask can be provided as an initial input to guide the model.<\/li>\n<li>Points: Users can input [x, y] coordinates along with their type (foreground or background) to help define object boundaries.<\/li>\n<li>Box: Bounding boxes can be specified using coordinates [x1, y1, x2, y2] to inform the model about the location and size of the object.<\/li>\n<li>Text: Textual prompts can also be used to provide additional context or to specify the object of interest.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-5648179\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/338558258_1349701259095991_4358060436604292355_n.png\" alt=\"\" width=\"738\" height=\"370\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/338558258_1349701259095991_4358060436604292355_n.png 3840w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/338558258_1349701259095991_4358060436604292355_n-300x150.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/338558258_1349701259095991_4358060436604292355_n-1024x514.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/338558258_1349701259095991_4358060436604292355_n-1536x770.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/338558258_1349701259095991_4358060436604292355_n-2048x1027.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/338558258_1349701259095991_4358060436604292355_n-300x150@2x.png 600w\" sizes=\"(max-width: 738px) 100vw, 738px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Diving deeper into SAM&#8217;s architecture, we can explore its key components:<\/p>\n<ul>\n<li>Image encoder: SAM&#8217;s default image encoder is ViT-H, but it can also utilize ViT-L or ViT-B depending on the specific requirements.<\/li>\n<li>Downsample: To reduce the resolution of the prompt binary mask, a series of convolutional layers are employed.<\/li>\n<li>Prompt encoder: Positional embeddings are used to encode various input prompts, which help inform the model about the location and context of objects within the image.<\/li>\n<li>Mask decoder: A modified transformer encoder serves as the mask decoder, translating the encoded prompts and image embeddings into the final object masks.<\/li>\n<li>Valid masks: For any given prompt, SAM generates the three most relevant masks, providing users with an array of options to choose from.<\/li>\n<\/ul>\n<p>They\u2019ve trained the model with a weighted combination of focal, dice, and IoU loss. With weights 20, 1, 1 respectively.<\/p>\n<p>The strength of SAM lies in its adaptability and flexibility, as it can work with different prompt types to generate accurate segmentation masks. Much like foundational Language Models (LLMs), which serve as a strong base for various natural language processing applications, SAM also provides a solid foundation for computer vision tasks. The model&#8217;s architecture has been designed to facilitate easy fine-tuning for downstream tasks, enabling it to be tailored to specific use cases or domains. By fine-tuning SAM on task-specific data, developers can enhance its performance and ensure that it meets the unique requirements of their application.<\/p>\n<p>This capacity for fine-tuning not only allows SAM to achieve impressive performance in a variety of scenarios but also promotes a more efficient development process. With pre-trained models serving as a starting point, developers can focus on optimizing the model for their specific task, rather than starting from scratch. This approach not only saves time and resources but also leverages the extensive knowledge encoded in the pre-trained model, resulting in a more robust and accurate system.<\/p>\n<h2><strong>Natural<\/strong> <strong>Language prompts<\/strong><\/h2>\n<p>The integration of text prompts with the SAM enables the model to perform highly specific and context-aware object segmentation. By leveraging natural language prompts, SAM can be guided to segment objects of interest based on their semantic properties, attributes, or relationships to other objects within the scene.<\/p>\n<p>In the process of training SAM, the largest publicly available CLIP model (ViT-L\/14@336px) is used to compute text and image embeddings. These embeddings are normalized before being utilized in the training process.<\/p>\n<p>To generate training prompts, the bounding box around each mask is first expanded by a random factor ranging from 1x to 2x. The expanded box is then square-cropped to maintain its aspect ratio and resized to 336&#215;336 pixels. Before feeding the crop to the CLIP image encoder, pixels outside the mask are zeroed out with a 50% probability. Masked attention is used in the last layer of the encoder to ensure the embedding focuses on the object, restricting attention from the output token to the image positions inside the mask. The output token embedding serves as the final prompt. During training, the CLIP-based prompt is provided first, followed by iterative point prompts to refine the prediction.<\/p>\n<p>For inference, the unmodified CLIP text encoder is used to create a prompt for SAM. The model relies on the alignment of text and image embeddings achieved by CLIP, which enables training without explicit text supervision while still using text-based prompts for inference. This approach allows SAM to effectively leverage natural language prompts to achieve accurate and context-aware segmentation results.<\/p>\n<p><em>Unfortunately, Meta hasn\u2019t released the weights of SAM with a text encoder (yet?).<\/em><\/p>\n<h2>Luca\u2019s Project: <strong>lang-segment-anything<\/strong><\/h2>\n<p>The <a href=\"https:\/\/github.com\/luca-medeiros\/lang-segment-anything\">lang-segment-anything<\/a> library presents an innovative approach to object detection and segmentation by combining the strengths of <a href=\"https:\/\/arxiv.org\/abs\/2303.05499\">GroundingDino<\/a> and SAM.<\/p>\n<p>Initially, GroundingDino performs zero-shot <code>text-to-bounding-box<\/code> object detection, efficiently identifying objects of interest in images based on natural language descriptions. These bounding boxes are then used as input prompts for the SAM model, which generates precise segmentation masks for the identified objects.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\nfrom  PIL  import  Image<br \/>\nfrom lang_sam import LangSAM<br \/>\nfrom lang_sam.utils import draw_image\n\nmodel = LangSAM()<br \/>\nimage_pil = Image.open('.\/assets\/car.jpeg').convert(\"RGB\")<br \/>\ntext_prompt = 'car, wheel'<br \/>\nmasks, boxes, labels, logits = model.predict(image_pil, text_prompt)<br \/>\nimage = draw_image(image_pil, masks, boxes, labels)<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5648180\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-09-at-11.31.09-PM.png\" alt=\"\" width=\"618\" height=\"390\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-09-at-11.31.09-PM.png 2634w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-09-at-11.31.09-PM-300x189.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-09-at-11.31.09-PM-1024x646.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-09-at-11.31.09-PM-1536x969.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-09-at-11.31.09-PM-2048x1292.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/Screenshot-2023-05-09-at-11.31.09-PM-300x189@2x.png 600w\" sizes=\"(max-width: 618px) 100vw, 618px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h3>Lightning App<\/h3>\n<p>You can quickly deploy an application using the Lightning AI App framework. We will use <code>ServeGradio<\/code> component to deploy our model with UI. You can learn more about ServeGradio <a href=\"https:\/\/lightning.ai\/docs\/app\/stable\/api_reference\/generated\/lightning.app.components.serve.gradio_server.ServeGradio.html?highlight=servegradio\">here<\/a>.<\/p>\n<pre class=\"code-shortcode dark-theme window- collapse-false \" style=\"--height:falsepx\"><code class=\"language-python\"><br \/>\nimport os\n\nimport gradio as gr<br \/>\nimport lightning as L<br \/>\nimport numpy as np<br \/>\nfrom lightning.app.components.serve import ServeGradio<br \/>\nfrom PIL import Image\n\nfrom lang_sam import LangSAM<br \/>\nfrom lang_sam import SAM_MODELS<br \/>\nfrom lang_sam.utils import draw_image<br \/>\nfrom lang_sam.utils import load_image\n\nclass LitGradio(ServeGradio):\n\n    inputs = [<br \/>\n        gr.Dropdown(choices=list(SAM_MODELS.keys()), label=\"SAM model\", value=\"vit_h\"),<br \/>\n        gr.Slider(0, 1, value=0.3, label=\"Box threshold\"),<br \/>\n        gr.Slider(0, 1, value=0.25, label=\"Text threshold\"),<br \/>\n        gr.Image(type=\"filepath\", label='Image'),<br \/>\n        gr.Textbox(lines=1, label=\"Text Prompt\"),<br \/>\n    ]<br \/>\n    outputs = [gr.outputs.Image(type=\"pil\", label=\"Output Image\")]\n\n    def __init__(self, sam_type=\"vit_h\"):<br \/>\n        super().__init__()<br \/>\n        self.ready = False<br \/>\n        self.sam_type = sam_type\n\n    def predict(self, sam_type, box_threshold, text_threshold, image_path, text_prompt):<br \/>\n        print(\"Predicting... \", sam_type, box_threshold, text_threshold, image_path, text_prompt)<br \/>\n        if sam_type != self.model.sam_type:<br \/>\n            self.model.build_sam(sam_type)<br \/>\n        image_pil = load_image(image_path)<br \/>\n        masks, boxes, phrases, logits = self.model.predict(image_pil, text_prompt, box_threshold, text_threshold)<br \/>\n        labels = [f\"{phrase} {logit:.2f}\" for phrase, logit in zip(phrases, logits)]<br \/>\n        image_array = np.asarray(image_pil)<br \/>\n        image = draw_image(image_array, masks, boxes, labels)<br \/>\n        image = Image.fromarray(np.uint8(image)).convert(\"RGB\")<br \/>\n        return image\n\n    def build_model(self, sam_type=\"vit_h\"):<br \/>\n        model = LangSAM(sam_type)<br \/>\n        self.ready = True<br \/>\n        return model\n\napp = L.LightningApp(LitGradio())<br \/>\n<\/code><div class=\"copy-button\"><button class=\"expand-button\">Expand<\/button><button class=\"copy\">Copy<\/button><\/div><\/pre>\n<p>And just like that, the app is launched in the browser!<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-5648181\" src=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png\" alt=\"\" width=\"819\" height=\"462\" srcset=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png 3430w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person-300x169.png 300w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person-1024x578.png 1024w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person-1536x867.png 1536w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person-2048x1156.png 2048w, https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person-300x169@2x.png 600w\" sizes=\"(max-width: 819px) 100vw, 819px\" \/><\/p>\n<p><strong>Conclusion<\/strong><\/p>\n<p>And that&#8217;s a wrap on our introduction to the Segment Anything Model. It&#8217;s clear that SAM is a valuable tool for computer vision researchers and developers alike, with its ability to handle a wide range of segmentation tasks and adapt to different prompt types. Its architecture allows for easy implementation, making it versatile enough to be tailored to specific use cases and domains. Overall, SAM has quickly become an important asset for the machine learning community and is sure to continue making waves in the field.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>About the Author<\/strong><\/p>\n<p>I\u2019m <a href=\"https:\/\/www.linkedin.com\/in\/luca-medeiros\">Luca Medeiros<\/a>, computer vision engineer and champion ambassador of Lightning League. I am interested in multimodal models and am an OSS enthusiast.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Segment Anything Model (SAM) In recent years, computer vision has witnessed remarkable advancements, particularly in image segmentation and object detection tasks. One of the most recent notable breakthroughs is the Segment Anything Model (SAM), a versatile deep-learning model designed to predict object masks from images and input prompts efficiently. By utilizing powerful encoders and decoders,<a class=\"excerpt-read-more\" href=\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\" title=\"ReadLang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt\">&#8230; Read more &raquo;<\/a><\/p>\n","protected":false},"author":16,"featured_media":5648181,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":"","_links_to":"","_links_to_target":""},"categories":[106,41],"tags":[],"glossary":[],"acf":{"additional_authors":false,"mathjax":false,"default_editor":true,"show_table_of_contents":false,"hide_from_archive":false,"content_type":"Blog Post","sticky":false,"custom_styles":""},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt - Lightning AI<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt - Lightning AI\" \/>\n<meta property=\"og:description\" content=\"Segment Anything Model (SAM) In recent years, computer vision has witnessed remarkable advancements, particularly in image segmentation and object detection tasks. One of the most recent notable breakthroughs is the Segment Anything Model (SAM), a versatile deep-learning model designed to predict object masks from images and input prompts efficiently. By utilizing powerful encoders and decoders,... Read more &raquo;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\" \/>\n<meta property=\"og:site_name\" content=\"Lightning AI\" \/>\n<meta property=\"article:published_time\" content=\"2023-05-24T17:18:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-05-24T17:20:42+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person-1024x578.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"578\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"JP Hennessy\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:site\" content=\"@LightningAI\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"JP Hennessy\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\"},\"author\":{\"name\":\"JP Hennessy\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\"},\"headline\":\"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt\",\"datePublished\":\"2023-05-24T17:18:16+00:00\",\"dateModified\":\"2023-05-24T17:20:42+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\"},\"wordCount\":1399,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png\",\"articleSection\":[\"Community\",\"Tutorials\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\",\"url\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\",\"name\":\"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt - Lightning AI\",\"isPartOf\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png\",\"datePublished\":\"2023-05-24T17:18:16+00:00\",\"dateModified\":\"2023-05-24T17:20:42+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png\",\"width\":3430,\"height\":1936},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/lightning.ai\/pages\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/lightning.ai\/pages\/#website\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"name\":\"Lightning AI\",\"description\":\"The platform for teams to build AI.\",\"publisher\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/lightning.ai\/pages\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/lightning.ai\/pages\/#organization\",\"name\":\"Lightning AI\",\"url\":\"https:\/\/lightning.ai\/pages\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"contentUrl\":\"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png\",\"width\":1744,\"height\":856,\"caption\":\"Lightning AI\"},\"image\":{\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/LightningAI\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6\",\"name\":\"JP Hennessy\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g\",\"caption\":\"JP Hennessy\"},\"url\":\"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt - Lightning AI","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/","og_locale":"en_US","og_type":"article","og_title":"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt - Lightning AI","og_description":"Segment Anything Model (SAM) In recent years, computer vision has witnessed remarkable advancements, particularly in image segmentation and object detection tasks. One of the most recent notable breakthroughs is the Segment Anything Model (SAM), a versatile deep-learning model designed to predict object masks from images and input prompts efficiently. By utilizing powerful encoders and decoders,... Read more &raquo;","og_url":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/","og_site_name":"Lightning AI","article_published_time":"2023-05-24T17:18:16+00:00","article_modified_time":"2023-05-24T17:20:42+00:00","og_image":[{"width":1024,"height":578,"url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person-1024x578.png","type":"image\/png"}],"author":"JP Hennessy","twitter_card":"summary_large_image","twitter_creator":"@LightningAI","twitter_site":"@LightningAI","twitter_misc":{"Written by":"JP Hennessy","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#article","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/"},"author":{"name":"JP Hennessy","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6"},"headline":"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt","datePublished":"2023-05-24T17:18:16+00:00","dateModified":"2023-05-24T17:20:42+00:00","mainEntityOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/"},"wordCount":1399,"commentCount":0,"publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png","articleSection":["Community","Tutorials"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/","url":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/","name":"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt - Lightning AI","isPartOf":{"@id":"https:\/\/lightning.ai\/pages\/#website"},"primaryImageOfPage":{"@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage"},"image":{"@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage"},"thumbnailUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png","datePublished":"2023-05-24T17:18:16+00:00","dateModified":"2023-05-24T17:20:42+00:00","breadcrumb":{"@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#primaryimage","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/05\/person.png","width":3430,"height":1936},{"@type":"BreadcrumbList","@id":"https:\/\/lightning.ai\/pages\/community\/lang-segment-anything-object-detection-and-segmentation-with-text-prompt\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/lightning.ai\/pages\/"},{"@type":"ListItem","position":2,"name":"Lang Segment Anything \u2013 Object Detection and Segmentation With Text Prompt"}]},{"@type":"WebSite","@id":"https:\/\/lightning.ai\/pages\/#website","url":"https:\/\/lightning.ai\/pages\/","name":"Lightning AI","description":"The platform for teams to build AI.","publisher":{"@id":"https:\/\/lightning.ai\/pages\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/lightning.ai\/pages\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/lightning.ai\/pages\/#organization","name":"Lightning AI","url":"https:\/\/lightning.ai\/pages\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/","url":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","contentUrl":"https:\/\/lightningaidev.wpengine.com\/wp-content\/uploads\/2023\/02\/image-17.png","width":1744,"height":856,"caption":"Lightning AI"},"image":{"@id":"https:\/\/lightning.ai\/pages\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/LightningAI"]},{"@type":"Person","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/2518f4d5541f8e98016f6289169141a6","name":"JP Hennessy","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/lightning.ai\/pages\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/28ade268218ae45f723b0b62499f527a?s=96&d=mm&r=g","caption":"JP Hennessy"},"url":"https:\/\/lightning.ai\/pages\/author\/jplightning-ai\/"}]}},"_links":{"self":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5648177"}],"collection":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/users\/16"}],"replies":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/comments?post=5648177"}],"version-history":[{"count":0,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/posts\/5648177\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media\/5648181"}],"wp:attachment":[{"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/media?parent=5648177"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/categories?post=5648177"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/tags?post=5648177"},{"taxonomy":"glossary","embeddable":true,"href":"https:\/\/lightning.ai\/pages\/wp-json\/wp\/v2\/glossary?post=5648177"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}